You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I appreciate your efforts for creating this dataset:) The dataset really seems to be useful for testing of code models. However, I have found some examples in the dataset to be ambiguous while some to be noisy which you can find below:
Problem_id_32: The task is to create a transition matrix based on the given adjacency list. The adjacency list[i] would store all the nodes connected to node i. One would update the transition_matrix[node][neighbor] by the respective probability, but in the ground truth transition_matrix[neighbor][node] is being updated. I feel that row-major or column-major should be explicitly mentioned in the instruction.
Problem_id_66: For this problem, the instruction looks good but in the ground truth, along with the edits mentioned in the instruction, one more row of data namely "2024-01-02,P1001,Canada,Online,34,72.99,24,Female" is added in the 'data' variable. The models would never add this until it's explicitly mentioned in the instruction due to which most likely all the models will fail on this example.
Problem_id_21: The task is to modify the 'distances_to' function to support negative weights. In the source code, the input is an undirected graph. In the ground truth, the undirected is being converted into directed and then the Bellman Ford algorithm is implemented on top of it. I feel that converting the undirected graph into directed should be explicitly mentioned in the instruction.
Problem_id_28: The task is to implement a function which checks whether the given string contains special characters. Here, the set of special characters should be explicitly mentioned in the instruction, otherwise the model wouldn't know what to consider as special characters.
Please look into these and make the necessary changes in the dataset.
The text was updated successfully, but these errors were encountered:
Hi,
I appreciate your efforts for creating this dataset:) The dataset really seems to be useful for testing of code models. However, I have found some examples in the dataset to be ambiguous while some to be noisy which you can find below:
Problem_id_32: The task is to create a transition matrix based on the given adjacency list. The adjacency list[i] would store all the nodes connected to node i. One would update the transition_matrix[node][neighbor] by the respective probability, but in the ground truth transition_matrix[neighbor][node] is being updated. I feel that row-major or column-major should be explicitly mentioned in the instruction.
Problem_id_66: For this problem, the instruction looks good but in the ground truth, along with the edits mentioned in the instruction, one more row of data namely "2024-01-02,P1001,Canada,Online,34,72.99,24,Female" is added in the 'data' variable. The models would never add this until it's explicitly mentioned in the instruction due to which most likely all the models will fail on this example.
Problem_id_21: The task is to modify the 'distances_to' function to support negative weights. In the source code, the input is an undirected graph. In the ground truth, the undirected is being converted into directed and then the Bellman Ford algorithm is implemented on top of it. I feel that converting the undirected graph into directed should be explicitly mentioned in the instruction.
Problem_id_28: The task is to implement a function which checks whether the given string contains special characters. Here, the set of special characters should be explicitly mentioned in the instruction, otherwise the model wouldn't know what to consider as special characters.
Please look into these and make the necessary changes in the dataset.
The text was updated successfully, but these errors were encountered: