Skip to content

feat: Added LaTeX error modification using deepseek-reasoner #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

Gavin-WangSC
Copy link
Contributor

No description provided.

This comment was marked as off-topic.

@q1zhen
Copy link
Member

q1zhen commented Apr 10, 2025

N.B., INTENTIONALLY LEFT AS A DRAFT PULL REQUEST FOR THE IMMATURE LLM-BASED CORRECTION APPROACH.

@at-wr
Copy link
Collaborator

at-wr commented Apr 10, 2025

Why did you remove the guideline “Keep spacing between Chinese-English characters”

@Gavin-WangSC
Copy link
Contributor Author

Why did you remove the guideline “Keep spacing between Chinese-English characters”

o sorry, that's unintentional.

@RadioNoiseE
Copy link
Collaborator

Why did you remove the guideline “Keep spacing between Chinese-English characters”

This is actually removed by @q1zhen in 0e531c8. We decided to use a post-pass rather than relying on poor llm "constrained" output.

@RadioNoiseE
Copy link
Collaborator

RadioNoiseE commented Apr 11, 2025

Why did you remove the guideline “Keep spacing between Chinese-English characters”

o sorry, that's unintentional.

Hi, @Gavin-WangSC

I saw that your fork is 1 commit ahead of, 7 commits behind. Please make sure it's in sync with the upstream before creating a pr (aka, resolve conflict).

@RadioNoiseE
Copy link
Collaborator

And just an advice: you are calling pdflatex to get possible errors, while this means TeX environment and various macro extensions are required.

For error checking, something like chktex as I had mentioned before will do, which only require any ANSI-C complaint compiler to build.

Anyway, thanks for contributing.

@RadioNoiseE
Copy link
Collaborator

RadioNoiseE commented Apr 11, 2025

N.B., INTENTIONALLY LEFT AS A DRAFT PULL REQUEST FOR THE IMMATURE LLM-BASED CORRECTION APPROACH.

I actually think an LLM based approach is unavoidable, since I don't think there exists any tool that can repair a broken LaTeX equation (from panic).

However if narrowed to balance curly braces, I think we can supply our own: still not easy though.

@q1zhen
Copy link
Member

q1zhen commented Apr 11, 2025

N.B., INTENTIONALLY LEFT AS A DRAFT PULL REQUEST FOR THE IMMATURE LLM-BASED CORRECTION APPROACH.

I actually think an LLM based approach is unavoidable, since I don't think there exists any tool that can repair a broken LaTeX equation (from panic).

However if narrowed to balance curly braces, I think we can supply our own: still not easy though.

Agree. However, he is using LLM to fix the article as a whole, rather than the erroneous parts. That's why I marked this PR a draft.

@q1zhen
Copy link
Member

q1zhen commented Apr 11, 2025

I would suggest suspending this fix for now. Watch for the frequency of LaTeX errors to see if it is high enough to worth taking time fixing them in an automatic pipeline.

@RadioNoiseE
Copy link
Collaborator

RadioNoiseE commented Apr 11, 2025

I would suggest suspending this fix for now. Watch for the frequency of LaTeX errors to see if it is high enough to worth taking time fixing them in an automatic pipeline.

Fair enough. Or maybe just change the model composing the article.

@RadioNoiseE
Copy link
Collaborator

Deepseek seems to constantly ignore my prompt telling it the correct way to integrate equation into markdown. It keeps output things like $<random figure>$<unit immediately followed> which will be parsed into \$<random figure>\$<unit immediately followed>.

@RadioNoiseE
Copy link
Collaborator

RadioNoiseE commented Apr 11, 2025

The underlying issue is that Markdown is such a weak markup language, which lacks a standard way of writing even math.

@Gavin-WangSC
Copy link
Contributor Author

Why did you remove the guideline “Keep spacing between Chinese-English characters”

This is actually removed by @q1zhen in 0e531c8. We decided to use a post-pass rather than relying on poor llm "constrained" output.

Why did you remove the guideline “Keep spacing between Chinese-English characters”

o sorry, that's unintentional.

Hi, @Gavin-WangSC

I saw that your fork is 1 commit ahead of, 7 commits behind. Please make sure it's in sync with the upstream before creating a pr (aka, resolve conflict).

N.B., INTENTIONALLY LEFT AS A DRAFT PULL REQUEST FOR THE IMMATURE LLM-BASED CORRECTION APPROACH.

I actually think an LLM based approach is unavoidable, since I don't think there exists any tool that can repair a broken LaTeX equation (from panic).

However if narrowed to balance curly braces, I think we can supply our own: still not easy though.

I would suggest suspending this fix for now. Watch for the frequency of LaTeX errors to see if it is high enough to worth taking time fixing them in an automatic pipeline.

Understood👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants