Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: just integration-otel to see traces #33

Closed
wants to merge 3 commits into from

Conversation

codefromthecrypt
Copy link
Contributor

@codefromthecrypt codefromthecrypt commented Sep 7, 2024

This helps understand what's going on as sometimes errors that bubble up in tests aren't entirely helpful

otel-tui

Note: I also looked at https://github.com/kuisathaverat/pytest_otel which makes a single trace for all tests with subspans for individual ones. I ran into dependency conflicts and raised an issue about it. We can do this and raise another PR later, should pytest_otel be good for us.

@codefromthecrypt
Copy link
Contributor Author

oops forgot to add the .env file that sets the correct variables for otel export etc. will do when I get back to my computer

Signed-off-by: Adrian Cole <[email protected]>
@codefromthecrypt
Copy link
Contributor Author

added the .env file

@michaelneale
Copy link
Collaborator

oh this looks quite helpful

@codefromthecrypt
Copy link
Contributor Author

one thing this doesn't do is payloads. If desired, I can help with instrumentation (starting with openai) to add the genai spans, which would have the prompt and completion events, as well tokens used etc in metrics. that's big enough to be a separate follow-up PR

@codefromthecrypt codefromthecrypt marked this pull request as draft September 9, 2024 00:19
@michaelneale
Copy link
Collaborator

cc @baxen for your thoughts

@codefromthecrypt
Copy link
Contributor Author

ps i think genai spans is the best bang for buck here. i can help with that at some point. it would get token and latency metrics plus spans that have the payloads

@andrewblane
Copy link
Collaborator

andrewblane commented Sep 10, 2024

@codefromthecrypt Tested this out for some 403s I have been debugging on a provider. I know that the 403 returned payload contains a useful error message (approximately "signature does not match payload") when i inspect it in a debugger, but it doesn't seem to get surfaced here. Is this a limitation of the TUI (all nodes are expanded in the screenshot below).

image

@codefromthecrypt
Copy link
Contributor Author

@andrewblane actually this is a limitation of the http semantics of otel, which don't currently defined request/response body logging. incidentally the genai layer ones do, so happy to help put a layer above that which would have the prompt/completion (as span events).

Another way is to somehow patch the HTTP otel instrumentation to also log the bodies, despite not being in their specs.
Yet another way is we can make a logging switch for HTTP layer and leave instrumentation of that alone.

Thoughts?

@lifeizhou-ap
Copy link
Collaborator

lifeizhou-ap commented Sep 18, 2024

Hi @codefromthecrypt Thanks for putting up this PR!

Firstly, I would like to make sure that my understanding of your changes in the PR is correct. Your changes include:

  1. Enabled auto instrumentation to trace the http calls
  2. Export the data to the service otel-tui to visualise the traced data.
  3. To enable the instrumentation, it requires to start the otel-tui service locally and using the opentelemetry-instrument command to run the tests.
  4. Currently it can only traces the http calls while running tests using as the dependency is installed in dev environment

Please correct me if my understanding is incorrect.

Some feedbacks:
👍 It is good that the user can choose to enable/disable the instrumentation by running the application with/without opentelemetry-instrument depends on their preference.

❓ Currently the instrumentation can be only enabled for tests. Just wondering whether it will also be useful to enable instrument while using goose? I guess trace and metrics might be useful to track the performance/usage of the goose and it could export to other backend services. It Is worthwhile to enable this from the goose instead exchange?

@codefromthecrypt
Copy link
Contributor Author

added this to start the story in a better place. square/goose#75

With this in, any future instrumentation here, e.g. LLM spans, would be great. For those who don't know, the GenAI (aka LLM) span would be a parent to the HTTP one and look like this, but with something goose would do:
Screenshot 2024-09-18 at 6 17 00 PM

@codefromthecrypt
Copy link
Contributor Author

So basically I can close this out and/or repurpose this PR for GenAI spans (for openai as more than that is too much for one PR). Then, when goose runs it will have the token and prompt/completion data as well the metadata about the underlying http calls (which do not in otel include bodies). lemme know which is preferred.

@lifeizhou-ap
Copy link
Collaborator

lifeizhou-ap commented Sep 19, 2024

So basically I can close this out and/or repurpose this PR for GenAI spans (for openai as more than that is too much for one PR). Then, when goose runs it will have the token and prompt/completion data as well the metadata about the underlying http calls (which do not in otel include bodies). lemme know which is preferred.

Yes, please close this PR. @baxen and I had a chat. At the moment we will use a collector to get the token usage and cost as a simple solution. In the long run, it makes sense to use open telemetry traced data. When there are requirements to track more data from the http calls in the future, we will consider to get them from open telemetry instrumentation data.

Also just curious about GenAI span which you showed in the image, does it require any specific handling to get these data (input_tokens, output_tokens) or they are automatically tracked? I could not find the code about tracking these data in this PR

@codefromthecrypt
Copy link
Contributor Author

Gotcha and closing for now.

Also just curious about GenAI span which you showed in the image, does it require any specific handling to get these data?

yeah basically it is scraped from the request/response data. since code handling openai here is custom, it would need to be reproduced.

p.s. kibana also has a "business function" of token counting which is independent of the telemetry aspect. In the near future you can expect standard dashboards which will help folks in aggregate know about token spend from telemetry. That said it is also expected that products that do their own counting for UX reasons, will continue to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants