Skip to content

Conversation

tired-engineer
Copy link

This commit fixes the "RuntimeError: generator raised StopIteration" bug that occurs when processing XML dumps in Python 3.7+.

Problem:

PEP 479 (enforced in Python 3.7+) converts StopIteration exceptions raised inside generators to RuntimeError. The mwxml library violated this by calling next() inside generator functions without catching StopIteration.

When the XML stream was exhausted:

  1. etree.iterparse() raised StopIteration
  2. This propagated through EventPointer.next()
  3. StopIteration was raised inside ElementIterator.iter() generator
  4. PEP 479 converted this to RuntimeError

Solution:

Added try-except blocks in mwxml/element_iterator.py to catch StopIteration in two methods:

  • ElementIterator.iter() (line 58)
  • ElementIterator.complete() (line 72)

When StopIteration is caught, the loop breaks normally, preventing the exception from escaping the generator.

Changes:

  • Modified: mwxml/element_iterator.py

    • Added StopIteration handling in iter() method
    • Added StopIteration handling in complete() method
  • Added: mwxml/iteration/tests/test_stopiteration_bug.py

    • Comprehensive test suite with 6 tests
    • Tests reproduction, normal iteration, edge cases

Testing:

✓ All 6 new tests pass
✓ All 20 existing iteration tests pass
✓ All 3 element_iterator tests pass
✓ Tested with real Wikipedia XML dump
✓ No performance regression
✓ Backward compatible with Python 3.6

Compatibility:

  • Required for Python 3.7+
  • Backward compatible with Python 3.6 and earlier
  • Tested on Python 3.11.7

References:

This commit fixes the "RuntimeError: generator raised StopIteration" bug
that occurs when processing XML dumps in Python 3.7+.

Problem:
--------
PEP 479 (enforced in Python 3.7+) converts StopIteration exceptions
raised inside generators to RuntimeError. The mwxml library violated
this by calling next() inside generator functions without catching
StopIteration.

When the XML stream was exhausted:
1. etree.iterparse() raised StopIteration
2. This propagated through EventPointer.__next__()
3. StopIteration was raised inside ElementIterator.__iter__() generator
4. PEP 479 converted this to RuntimeError

Solution:
---------
Added try-except blocks in mwxml/element_iterator.py to catch
StopIteration in two methods:
- ElementIterator.__iter__() (line 58)
- ElementIterator.complete() (line 72)

When StopIteration is caught, the loop breaks normally, preventing
the exception from escaping the generator.

Changes:
--------
- Modified: mwxml/element_iterator.py
  - Added StopIteration handling in __iter__() method
  - Added StopIteration handling in complete() method

- Added: mwxml/iteration/tests/test_stopiteration_bug.py
  - Comprehensive test suite with 6 tests
  - Tests reproduction, normal iteration, edge cases

Testing:
--------
✓ All 6 new tests pass
✓ All 20 existing iteration tests pass
✓ All 3 element_iterator tests pass
✓ Tested with real Wikipedia XML dump
✓ No performance regression
✓ Backward compatible with Python 3.6

Compatibility:
--------------
- Required for Python 3.7+
- Backward compatible with Python 3.6 and earlier
- Tested on Python 3.11.7

References:
-----------
- PEP 479: https://peps.python.org/pep-0479/
- Issue: RuntimeError: generator raised StopIteration in Python 3.7+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant