Skip to content

Stack static improvements#2779

Closed
geoo89 wants to merge 6 commits intoPreTeXtBook:masterfrom
IDEMSInternational:stack-static-improvements
Closed

Stack static improvements#2779
geoo89 wants to merge 6 commits intoPreTeXtBook:masterfrom
IDEMSInternational:stack-static-improvements

Conversation

@geoo89
Copy link
Copy Markdown
Contributor

@geoo89 geoo89 commented Jan 29, 2026

Adds support for a variety of STACK features, such as most existing input types and images, and improves error handling. A few example questions have been added to the sample article. (I plan to create a STACK example in the examples folder with a more detailed test suite.)

Also fixes a bug in the HTML rendering of STACK questions featuring multi-line latex, see #2778

@geoo89 geoo89 force-pushed the stack-static-improvements branch from 1edc49f to cc2586a Compare February 16, 2026 14:02
@geoo89 geoo89 marked this pull request as ready for review February 16, 2026 14:03
@geoo89 geoo89 changed the title Stack static improvements (WIP) Stack static improvements Feb 16, 2026
@geoo89 geoo89 force-pushed the stack-static-improvements branch from cc2586a to 328c878 Compare February 16, 2026 15:16
@rbeezer
Copy link
Copy Markdown
Collaborator

rbeezer commented Feb 16, 2026

OK, just one showstopper. Extant "stack integration" question has gone backwards. This needs a fix before I can merge.

  • The static version seems to have picked up a span element inside the m in the answer, with the result that there is nothing in the LaTeX output.

  • The HTML version says "The provided file does not contain valid XML". I have not chased that down. I'd suspect it is related, but have no reason to back that up.

I have about an hour into rearranging commits and testing. Please do not change any commits here. You can tell me (clearly) about a minor edit and I will do it, or add on a new commit (which I will cherry-pick into what I have going).

General comments in next message - none meant for action on this PR.

@rbeezer
Copy link
Copy Markdown
Collaborator

rbeezer commented Feb 16, 2026

Some general comments. I'll start more general and work my way into minor nit-picking. ;-)

  • Good job with the images - I'd have expected that to be more of a chore.
  • New exercises 3, 4 and 5. PDF shows "Answer." and no content. (Maybe the span problem above?) Unclear what is going on.
  • There's a whole lot of regex stuff going on here in the Python. We like XML so we don't have to do this. ;-) I hope a lot of this can be fixed as the server produces more PreTeXt code. In any event, what do you think about putting the STACK routines into their own file/module? (Yes, these comments probably apply to WeBWorK, which has just grown on us slowly over many years.)
  • There was a bit of stray whitespace in the Python - two blank lines with multiple spaces. I'm really careful about keeping this out of the code and avoiding the attendant problems down the line.
  • STACK is a good prefix for commits, and if they are just static-related we can say so in the message. I do not capitalize the first word after the colon. Very good to have a separate commit for the sample article additions, and it should be prefixed as such.
  • I do before/after testing with/without new material in the sample article. It helps me if the additions are in the last commit.

Yes, I keep saying I should write these up in The Guide...

Thanks for all your work on this. You knew it was going to be a big job, no? ;-)

@geoo89
Copy link
Copy Markdown
Contributor Author

geoo89 commented Feb 17, 2026

Hi Rob, thanks for your quick response!

I cannot reproduce any of these issues, the STACK questions build fine for me:

image

The STACK integration question static version the I get when running pretext generate stack looks like this:

<stack-static>
<statement><p>Find <me> \int {{\left(x-3\right)}^6} d{x}</me> <fillin characters="20" name="ans1"/> </p></statement>
<solution><p>We can either do this question by inspection (i.e. spot the answer) or in a more formal manner by using the substitution <me> u = ({x}-{3}).</me> Then, since <m>\frac{d}{d{x}}u=1</m> we have <me> \int {{\left(x-3\right)}^6} d{x} = \int u^{6} du = \frac{u^{7}}{7}+c = {\frac{{\left(x-3\right)}^7}{7}}+c.</me></p></solution>
<answer><p><m>\frac{{\left(x-3\right)}^7}{7}+c</m></p></answer>
</stack-static>

I do remember having the issue with tags inside the <m> at some point during development, possibly when I initially made the draft PR, but that should be fixed. I did a rebase and a force-push of this branch before I marked this PR ready for review. Can you make sure you have the latest version of this branch (last commit being 328c878)?

I also noticed that I didn't update the API URL for the sample article to https://stack-api.maths.ed.ac.uk (now that this official one is working), but the one we have in the sample article is working for me too.

@geoo89
Copy link
Copy Markdown
Contributor Author

geoo89 commented Feb 17, 2026

  • There's a whole lot of regex stuff going on here in the Python. We like XML so we don't have to do this. ;-) I hope a lot of this can be fixed as the server produces more PreTeXt code. In any event, what do you think about putting the STACK routines into their own file/module? (Yes, these comments probably apply to WeBWorK, which has just grown on us slowly over many years.)

The Regexes are not actually capturing XML elements, they are mostly to convert MathJax to PreText, and replace STACK-specific blocks (e.g. input fields, feedback fields) with their PreText equivalent. I do parse the HTML that is returned by the API using lxml to convert commonly used HTML tags.

  • STACK is a good prefix for commits, and if they are just static-related we can say so in the message. I do not capitalize the first word after the colon. Very good to have a separate commit for the sample article additions, and it should be prefixed as such.

Thanks, I'll keep that in mind for the future. Let me know if you want me to update the commit messages or if you're going ahead with that yourself.

  • I do before/after testing with/without new material in the sample article. It helps me if the additions are in the last commit.

As above, let me know if you want me to reorder the commits. I'll keep it in mind for the future.

@rbeezer
Copy link
Copy Markdown
Collaborator

rbeezer commented Feb 17, 2026

We cannot do development with the CLI - I thought I have made this clear. It lags the main repository, and it caches some results. Please do careful testing (before/after) with the pretext/pretext script.

I never have any problem picking up changes due to a forced push.

So I picked this up fresh, made no changes, and generated static versions. The suspect ,span> is there. Please reproduce this.

Did you happen to see #2787? Might it have some effect? More to say, but we need to get over this first.

@geoo89 geoo89 force-pushed the stack-static-improvements branch from 328c878 to 2fef0b4 Compare February 18, 2026 07:51
@geoo89
Copy link
Copy Markdown
Contributor Author

geoo89 commented Feb 18, 2026

Hi Rob, I rebased onto main to see if the issue is related to #2787, but no dice. (I updated the commit messages in the process and the STACK API URL in the sample article.)

The only issue I was able to reproduce was "The provided file does not contain valid XML" in the HTML version for the integration question. It seems like the API didn't like the namespace in that question (there was no namespace in the minimal STACK example, just the sample article). I'm surprised it still produced a sensible static version. I removed that namespace now.

Whatever I try, I'm unable to reproduce the <span> in the answer tags, nor the answers not displaying in the static outputs.

I made sure my working tree is clean. I'm running
pretext/pretext -c stack sample-article.xml -p publication.xml,
pretext/pretext -f latex-plus sample-article.xml -p publication.xml -c all, and
pretext/pretext -f html sample-article.xml -p publication.xml -c all -d output/html
respectively, all of which produce the intended results.

I'm using Python 3.10.12, with the following libraries:

annotated-types==0.7.0
autocommand==2.2.2
backports.tarfile==1.2.0
black==24.10.0
bottle==0.13.2
certifi==2025.1.31
charset-normalizer==3.4.1
cheroot==10.0.1
CherryPy==18.10.0
click==8.1.8
click-log==0.4.0
CodeChat==1.9.4
CodeChat_Server==0.2.25
coloraide==4.2.2
coverage==7.6.1
docutils==0.20.1
errorhandler==2.0.1
exceptiongroup==1.2.2
flake8==6.1.0
ghp-import==2.1.0
gitdb==4.0.12
GitPython==3.1.44
greenlet==3.1.1
idna==3.10
iniconfig==2.1.0
jaraco.collections==5.1.0
jaraco.context==6.0.1
jaraco.functools==4.1.0
jaraco.text==4.0.0
Jinja2==3.1.6
json-five==1.1.2
lxml==5.3.1
lxml-stubs==0.5.1
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mccabe==0.7.0
mdurl==0.1.2
more-itertools==10.5.0
mypy==1.14.1
mypy-extensions==1.0.0
packaging==24.2
pathspec==0.12.1
pdfCropMargins==1.0.9
pep8-naming==0.14.1
pillow==10.4.0
plasTeX==3.1
platformdirs==4.3.7
playwright==1.48.0
pluggy==1.5.0
portend==3.2.0
-e git+ssh://git@github.com/IDEMSInternational/pretext-cli.git@9b94881a3c647dc0f9ee8aaa0c73718d33562a44#egg=pretext
psutil==7.0.0
pycodestyle==2.11.1
pydantic==2.10.6
pydantic-xml==2.14.3
pydantic_core==2.27.2
pyee==12.0.0
pyflakes==3.1.0
Pygments==2.19.1
PyMuPDF==1.24.11
PyPDF2==2.5.0
pypng==0.20220715.0
pytest==7.4.4
pytest-console-scripts==1.4.1
pytest-cov==4.1.0
pytest-mock==3.14.0
python-dateutil==2.9.0.post0
qrcode==7.4.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
shellingham==1.5.4
single-version==1.6.0
six==1.17.0
sly==0.5
smmap==5.0.2
strictyaml==1.7.3
tempora==5.7.1
thrift==0.21.0
toml==0.10.2
tomli==2.2.1
typer==0.15.2
typing-inspection==0.4.0
typing_extensions==4.12.2
Unidecode==1.3.8
urllib3==2.2.3
watchdog==4.0.2
websockets==13.1
zc.lockfile==3.0.post1

@anst-i
Copy link
Copy Markdown

anst-i commented Mar 27, 2026

FWIW, I checked out this PR locally, launched a local STACK API based on the current STACK dev branch (Demo server didn't work due to CORS restrictions), ran the commands @geoo89 posted in the previous post, and it worked smoothly for me (both HTML and PDF, all 5 STACK problems). See screenshot.

Screenshot From 2026-03-27 14-20-07

@rbeezer
Copy link
Copy Markdown
Collaborator

rbeezer commented Mar 27, 2026

Thanks, @anst-i . I am tardy on this one, since I am a bit baffled. It was close to the top of my list already - likely I can take another look early next week.

@geoo89
Copy link
Copy Markdown
Contributor Author

geoo89 commented Mar 27, 2026

With the last push on Feb 18 I fixed the one issue that I was able to reproduce, but I also changed the URL of the STACK API to use the official one from the University of Edinburgh (which is slightly ahead of the deployment that the examples referenced before). So there's a small chance that things magically got fixed for you as well through this.

@rbeezer
Copy link
Copy Markdown
Collaborator

rbeezer commented Mar 27, 2026

Change in plans, stalled out on today's work. So...

  • The sample article commit really had me confused. One edit to an existing problem. Then an edit to the generated output for that problem - which probably should have been on its own commit, or earlier, or not at all, or something. A change to the server URL. Then four new problems. Too many unrelated changes making it very hard for me to test carefully. I've got a good chunk of time invested in splitting those out and rearranging commits, so do not do a force push here without discussion.
  • Isolated server URL change in early commit, and name space change in adjacent commit. Deleted the edit to the generated output, since that seems to be an artifact of the bigger changes here. With only those commits, I get no changes between using master and those two commits, for the one existing problem. Good.
  • Now I pile on the four main commits here, without the four new examples. Doing just -c stack, the difference in output between master and the six commits is:
diff --git a/../sold/stack-integration.ptx b/./stack-integration.ptx
index 2e6fd33..50f8a1a 100644
--- a/../sold/stack-integration.ptx
+++ b/./stack-integration.ptx
@@ -1,6 +1,5 @@
 <stack-static>
-
     <statement><p>Find <me> \int {{\left(x-3\right)}^6} d{x}</me> <fillin characters="20" name="ans1"/> </p></statement>
     <solution><p>We can either do this question by inspection (i.e. spot the answer) or in a more formal manner by using the substitution <me> u = ({x}-{3}).</me> Then, since <m>\frac{d}{d{x}}u=1</m> we have <me> \int {{\left(x-3\right)}^6} d{x} = \int u^{6} du = \frac{u^{7}}{7}+c = {\frac{{\left(x-3\right)}^7}{7}}+c.</me></p></solution>
-    <answer><p><m>\frac{{\left(x-3\right)}^7}{7}+c</m></p></answer>
+    <answer><p><m><span>\frac{{\left(x-3\right)}^7}{7}+c</span></m></p></answer>
 </stack-static>
\ No newline at end of file
  • There is an extra <span> inside the <m>. That is not a PreTeXt element. So we are not making PreTeXt. That is the problem here. So, @anst-i, we do render "fine" since XSL just applies the default template to the span since it has no other template defined for it, and that default just applies templates to the content.
  • If you are not seeing this, then we need to understand why I am, and you are not, before we can go any further. I may set AI on this in a bit. ;-)

@rbeezer
Copy link
Copy Markdown
Collaborator

rbeezer commented Mar 28, 2026

I see the <span> being introduced in the "support broader feature palette for static version" commit.

So asked Claude Code to track it down. Its analysis will be the next comment. Maybe it is trash, maybe it is helpful. I'm sort of flying blind at this point.

@rbeezer
Copy link
Copy Markdown
Collaborator

rbeezer commented Mar 28, 2026

The bug is in _stack_replace_tags when called with mathmode=True on raw LaTeX.

The samplesolutionrender field from the STACK API for an algebraic answer contains raw LaTeX — in this case \frac{{\left(x-3\right)}^7}{7}+c, with no HTML wrapper.

At line 2877, _stack_replace_tags is called with this raw LaTeX and mathmode=True. Inside that function, lxml.html.fromstring() is used to parse the text. When lxml's HTML parser encounters text that is not an HTML block element (no <p>, <div>, etc.), it wraps it in <span>...</span> to produce a valid tree. So \frac{...}+c becomes <span>\frac{...}+c</span>.

The mathmode=True path was written to handle the case where lxml wraps block content in <p>...</p>, stripping those tags with removeprefix/removesuffix. But lxml uses <span> for inline content — and raw LaTeX with no enclosing HTML tag is treated as inline. The <span> stripping was never implemented, so it passes through intact.

The result is <span>\frac{...}</span>, which then gets wrapped in <m>...</m> at line 2879, yielding <m><span>\frac{...}</span></m>.

Root cause in one sentence: _stack_replace_tags assumes lxml wraps bare text in <p> (block), but lxml actually wraps it in <span> (inline), and the mathmode stripping only handles <p>.

Analysis by Claude Sonnet 4.6 as review assistant for @rbeezer.

@anst-i
Copy link
Copy Markdown

anst-i commented Mar 31, 2026

Thank you for the careful explanation @rbeezer , I am completely new to PreTeXt and I'm grateful for your patience.

I have checked my local generated/stack-integration.ptx from the PR, but it did not contain any <span> tags. Nor did output/html/generated/stack-integration.ptx. Same with the "support broader feature palette for static version" commit.

Regarding the explanation by the LLM, I am a bit puzzled: Line 2877 of the commit does not contain _stack_replace_tags, it is rather on line 2885. Are we sure we are on the same code base? (Yes, I am aware this sounds a bit silly; Just making sure!)

I have also tried running the relevant Python code in an isolated script, and for me html.fromstring() indeed creates a <p> tag around the LaTeX string.

If we can't find out why this function works differently on your machine, maybe we can change line 2794 to: tree = html.fragment_fromstring(text, create_parent='p') to force the created parent element to be a <p>? Or will this break other parts of the code?

Attached is a minimal code sample, containing the changed line in _stack_replace_tags(): htmlfromstring.py

@geoo89
Copy link
Copy Markdown
Contributor Author

geoo89 commented Mar 31, 2026

Thanks for taking a closer look, both of you.

Claude's suggestion is helpful, I've managed to track down the issue:

In lxml 5.3.1, lxml.html.fromstring("text") yields <p>text</p> which we strip.
In lxml 6.0.2, lxml.html.fromstring("text") yields <span>text</span> which we don't strip.

On my system I had lxml 5.3.1 installed while I assume you've had some version of lxml >=6.

The pypi pretext package appears to require lxml<7,>=6, but pretext/requirements.txt in this repository simply specifies lxml, which I suppose is why I ended up with an earlier version.

There appears to be a method lxml.html.fragment_fromstring which seems to have more predictable behavior. (EDIT: Andreas found it too. Using html.fragment_fromstring(text, create_parent='p') is not equivalent as that will wrap <p>text</p> into another <p> tag while html.fromstring does not. So I'll need to take a closer look for what behavior we actually want.) I'll try to update the code to use this and append a commit to this branch which you can then cherry-pick/squash, if that sounds good to you.

@geoo89
Copy link
Copy Markdown
Contributor Author

geoo89 commented Mar 31, 2026

Hi Rob,

I pushed a fix and tried to keep it as simple as possible.

The downside of this particular implementation is that we may have nested <p> tags (as you'll see in the static version of integration question for example). Pretext doesn't seem to complain about that. STACK users don't always wrap the text they write (e.g. for the question text or the solution) into <p> tags, and without wrapping some text may disappear in PreText, for example take text<p>more text</p> and some <em>highlighting</em>.

  • Without wrapping, PreText will only render "more text" but none of the other content in the PDF/LaTeX.
  • With wrapping, PreText will render the entire text, but we will have a <p> tag nested inside the wrapping <p> tag.

While STACK users will have to vet their questions to ensure that they work in PreText, I'd like to make it as easy as possible, so I'd rather have this supported.

If you prefer, I can write code that will parse the text more carefully and wrap anything that doesn't have a parent <p> tag into such a tag, e.g. for the example above giving <p>text</p><p>more text</p><p> and some <em>highlighting</em></p>. But this seems more error-prone and will probably require a few dozens of lines of additional code.

rbeezer added a commit that referenced this pull request Mar 31, 2026
@rbeezer
Copy link
Copy Markdown
Collaborator

rbeezer commented Mar 31, 2026

Yes, I have lxml 6.0.2. Next thing I might have done was to check your previous list of version numbers. Glad we have an explanation. (Maybe requirements.txt needs an upadate...)

Pretext doesn't seem to complain about that.

Well, it should if you do schema validation (while the schema is known to be a bit imperfect). The conversions do not test for legal PreTeXt. So lack of complaints, and especially simply getting ppretty output, do not equate to having quality source.

Consciously making nested #p sounds like a dangerous idea. The pre-processor could have a mode that "cleans up" whatever comes back from the server - a #p with an ancestor #p could be scrubbed. Or maybe do that with lxml? I've merged this, but would like to see the situation improve. And I really do not understand why the LaTeX does not end up with some extra vertical space due to the double-p?

Images: I think I said before that it is very good you have this working. A PDF version will of course work well for LaTeX/PDF. But we have other static formats. For example, we must have a PNG image for EPUB, or the build breaks. So those need to be generated - should be plenty of examples in the Python for how to do this (i.e. which tools). Mimic what is there or have AI mimic it.

Note that we do not regenerate things like this, they need to go into the "generated" directory and are committed to the repository. I've done that here before merging, but it should be part of the drill when you add new examples.

I think that is most everything. Website examples updated. Holler if there are unanswered questions.

@rbeezer rbeezer closed this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants