-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error evaluating TS/Java #89
Comments
I've seen this kind of failure before. The system does record stdout and stderr as the output above suggests. However, there are certain catastrophic cases where nothing is recorded, and you get:
First question -- are you using the MultiPL-E container? |
Yes, and this is the command I used and it is on MacOS |
Yeah, this is an annoying type of error that has been hard to diagnose. It is usually obvious when it happens: there is an error, but nothing recorded in the output files. We have this script that looks for it in the files: https://github.com/nuprl/MultiPL-E/blob/main/find_potential_faults.py |
Hi,
I'm using MultiPL-E evaluating models on different languages (latest version and it passed
make test
). But I found the scores on Java and TypeScript are not quite aligned with the trend on others. I checked some outputs and noticed that a lot of errors areSyntaxError
, and the code is logically correct.(1) Is it possible that something is wrong with the env? For example,
import org.javatuples.*;
is very suspicious, since the code often runs well after removing this import.(2) Besides status, can we also log the complete error outputs from the compiler/executor, so we can have better ideas about what errors happened.
Thank you!
Rui
Example of Java (HumanEval_28_concatenate):
Example of TS (HumanEval_0_has_close_elements):
The text was updated successfully, but these errors were encountered: