-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENHANCEMENT] Feedback behaviour improvements #6440
Comments
@cdvv7788 Definitely agree that we can improve annotations, and your feedback is welcome and appreciated.
Currently, annotating the root span of a trace will make an annotation appear on both the root span and the trace itself. Annotations on non-root spans will appear on the span only. ![]()
Does the span feedback appear on the right-hand side for you when you select a particular span in the span tree or in the ![]()
It sounds like you would like all span annotations to show up as top-level annotations on the trace?
Our data model currently assumes that there is at most one annotation for a particular name per span. We may need to revisit that assumption. Can you help me understand your use-case that allows for multiple thumbs up for a single span? It sounds like multiple users are interacting with a single output from an LLM in your application.
The intention with score is that it allows floating-point based annotations and evaluations, e.g., if I computed a floating point number in code, I could upload. The score field in an individual annotation or evaluation is not intended to be an aggregate metric, and we definitely don't expect end users to compute and upload their own aggregate metrics. I think this probably ties back to the previous point where we may need to relax the constraint on annotations and automatically compute aggregate metrics.
It sounds like having a DELETE route would solve this issue?
It sounds like the root cause of this pain is that you want multiple annotations of the same name on the same span, if I understand correctly.
Thanks so much for the detailed feedback! It sounds like this has been painful, and we'd definitely like to accommodate your use-case. |
From what I have seen, annotating any trace will make the annotation appear on both the trace and the annotated span. Any further update has to be done through the same span or it will be ignored. The updates will be propagated to the trace, but the actual span will be stuck with the initial annotation.
It does. Again, only for the first span annotated. From there on, the annotations are just ignored.
Ideally, I would like to just use phoenix as my database of annotations and it takes care of aggregating / summarizing as needed. I don't know if it is necessary for all of them to show up at the top level, but there must be something we can do for the UX.
I am using slack threads, anyone can give feedback there. I would like to have a score in the range -1/1, and it is moved in one direction or the other depending with an average or something.
That would be great. Do you know how popular this feature is? This would change it's behaviour completely and while I think it is great, I don't want to break anyone's workflow.
Yes, but I also need to be able to save/retrieve metadata for this specific situation. If phoenix takes care of the aggregations at some point, my application will need to try and keep the feedback in sync (if feedback is added or removed, act accordingly). Thanks! |
This is a bug, but I am not able to reproduce. Can you help me understand when are you hitting this issue, e.g., are you issuing multiple POST requests to
Sounds like the same issue as above.
Got it, thanks!
Good to know, thanks! I can definitely see how you might want multiple annotations of the same name associated with the same span in this case.
This is something we'll likely need to support.
By metadata, do you have in mind something like user ID? It sounds like you want to be able to create, update, and delete annotations for particular users.
Thanks for the feedback! Much appreciated! |
Yes. I have a slack conversation in a thread. Any reaction to any message (there is a different span_id per message) in that thread will trigger an annotation. For the first reaction, it works well, creates the annotation and propagates it to the trace. The second time is when problems show up:
Something similar to how slack handles messages metadata. Arbitrary payloads under a key (https://api.slack.com/metadata/using). In this case I could pass user_id, but it can be useful for other things. |
Thanks @cdvv7788!
This is definitely unexpected behavior. Can you help me understand the exact requests that are being issued to Phoenix? Are you just sending POST requests to
Can you also send the version of Phoenix you are using?
Got it, thanks! We do currently add metadata to span annotations via the POST. It would probably unblock this flow if we added a GET route for annotations that included that metadata. |
This is the payload I am sending. Label is either
Yes, I have something in the code but I am not sure where to check them. Phoenix version: 7.12.1 |
That looks correct to my eye. This one is tough for me to debug since I am not able to reproduce the issue. If you are willing, I'd love to hop on a call to take a look with you and see if we can get to the bottom of it. https://calendly.com/xander-arize/30min
Sounds great! |
@axiomofjoy I updated with no luck. Also, I am not able to reproduce the exact same issue. While the issue persists in my main repository, I created a smaller version to try and reproduce the issue at https://github.com/cdvv7788/phoenix-debug To make this work, just run ![]() Then run: It seems to be working as expected here at the span level, but I am not able to reproduce the whole issue because it is not being propagated to the parent span. From our discussion, this is unexpected behaviour right? Can you please confirm if you are observing it too? ![]() While I have indications in the list that the spans are annotated, the parent trace doesn't give any hint. To simplify my usecase, I think that having trace level annotations (multiple and auto aggregated somehow) should be enough to get going. |
Is your feature request related to a problem? Please describe.
I have been following https://docs.arize.com/phoenix/tracing/how-to-interact-with-traces/capture-feedback
My issue is that I don't understand what is the behaviour that should be expected with the feedback.
I have a trace with several spans, and I want to add feedback in any spans. The API suggests this is possible, with it accepting
span_id
as a parameter. However what I have found is that:Describe the solution you'd like
I would like to understand what is the expected behaviour here. I would prefer to avoid implementing a solution for this on my end, it makes more sense for this to live in phoenix, but I am either doing something wrong, or we need to improve the docs/feature.
I would like several things, but that obviously depends on the direction the project is going.
creates or updates
). There should be an explicit endpoint for creation only, handling updates separately, so we are not forced to a single observation per span/trace.span_id
field). I am forced to update the annotation in a span, which doesn't actually update, but updates the trace annotation (and has to be the original span that was annotated or it will just be ignored).Describe alternatives you've considered
Implementing this myself. I can just keep track of the trace_ids and keep the scores in my system instead of sending them to phoenix. Then I can attach them using the trace_id if I need it. Again, the hard blocker here is the lack of consistency in the annotations API. If I don't keep track of the exact span I used for the initial feedback, I have no way to update the annotation via the rest API.
TLDR;
The annotations API needs some love. The current documentation shows a very edge case of the feedback feature, and it can be improved across the board.
If I am doing something wrong or my expectations are just absurd, more context in the documentation can also help with this.
The text was updated successfully, but these errors were encountered: