Skip to content

[Bug] ProgramOfThought final_answer conflict #8448

Open
@armoucar-neon

Description

@armoucar-neon

What happened?

I'm testing the docs/docs/tutorials/program_of_thought/index.ipynb with dspy.LM("openai/gpt-4.1-mini-2025-04-14")

I've been getting wrong results from ProgramOfThought. Inspecting the generated prompts I can see some cases of:

  • generated_code overrides the final_answer function defined in dspy/primitives/runner.js
  • the generated final_answer is not invoked.

ProgramOfThought first LM call output:

[[ ## reasoning ## ]]
The expression given is 2*5 + 4. According to the order of operations, multiplication is performed before addition. So, first multiply 2 by 5, which equals 10, then add 4 to get the final result of 14.

[[ ## generated_code ## ]]
def final_answer():
    result = 2 * 5 + 4
    return {"result": result}

[[ ## completed ## ]]

ProgramOfThought second LM call input:

[[ ## question ## ]]
2*5 + 4

[[ ## final_generated_code ## ]]
def final_answer():
    result = 2 * 5 + 4
    return {"result": result}

[[ ## code_output ## ]]
""

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.

I also have some cases where the final_answer is invoked, but the output is still wrong because somehow this function conflicts with the final_answer defined in dspy/primitives/runner.js

ProgramOfThought first LM call output:

[[ ## reasoning ## ]]
The expression given is 2*5 + 4. According to the order of operations, multiplication is performed before addition. So, first multiply 2 by 5, which equals 10, then add 4 to get 14.

[[ ## generated_code ## ]]
def final_answer():
    answer = 2*5 + 4
    return {"answer": answer}

final_answer()

[[ ## completed ## ]]

ProgramOfThought second LM call input:

[[ ## question ## ]]
2*5 + 4

[[ ## final_generated_code ## ]]
def final_answer():
    answer = 2*5 + 4
    return {"answer": answer}

final_answer()

[[ ## code_output ## ]]
{}

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.

Wonder if there's a way of fixing this that is not updating the instructions in the ProgramOfThought generate mode.

Steps to reproduce

Described above.

DSPy version

2.6.27

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions