Open
Description
What happened?
I'm testing the docs/docs/tutorials/program_of_thought/index.ipynb
with dspy.LM("openai/gpt-4.1-mini-2025-04-14")
I've been getting wrong results from ProgramOfThought. Inspecting the generated prompts I can see some cases of:
generated_code
overrides thefinal_answer
function defined indspy/primitives/runner.js
- the generated
final_answer
is not invoked.
ProgramOfThought first LM call output:
[[ ## reasoning ## ]]
The expression given is 2*5 + 4. According to the order of operations, multiplication is performed before addition. So, first multiply 2 by 5, which equals 10, then add 4 to get the final result of 14.
[[ ## generated_code ## ]]
def final_answer():
result = 2 * 5 + 4
return {"result": result}
[[ ## completed ## ]]
ProgramOfThought second LM call input:
[[ ## question ## ]]
2*5 + 4
[[ ## final_generated_code ## ]]
def final_answer():
result = 2 * 5 + 4
return {"result": result}
[[ ## code_output ## ]]
""
Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
I also have some cases where the final_answer
is invoked, but the output is still wrong because somehow this function conflicts with the final_answer
defined in dspy/primitives/runner.js
ProgramOfThought first LM call output:
[[ ## reasoning ## ]]
The expression given is 2*5 + 4. According to the order of operations, multiplication is performed before addition. So, first multiply 2 by 5, which equals 10, then add 4 to get 14.
[[ ## generated_code ## ]]
def final_answer():
answer = 2*5 + 4
return {"answer": answer}
final_answer()
[[ ## completed ## ]]
ProgramOfThought second LM call input:
[[ ## question ## ]]
2*5 + 4
[[ ## final_generated_code ## ]]
def final_answer():
answer = 2*5 + 4
return {"answer": answer}
final_answer()
[[ ## code_output ## ]]
{}
Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
Wonder if there's a way of fixing this that is not updating the instructions in the ProgramOfThought generate mode.
Steps to reproduce
Described above.
DSPy version
2.6.27