Hello, I would like to ask how to reproduce the performance of Minimax 2.1 on Terminal-Bench 2.0? As for the framework, I am using Claude Code within Harbor; for the model, I am using MiniMax-M2.1 deployed on Openrouter, with the provider fixedly set to the official Minimax. I have run it four times, and the results obtained are 32/89 (35.96%), 35/89 ( 39.33%), 30/89 (33.71%), and 34/89(38.20). The average score of these four runs is 37.11. Additionally, I have removed the timeout limit. Could you tell me how to achieve a score of 47.9?