Skip to content

Commit

Permalink
Fix markdown parsing for mistral
Browse files Browse the repository at this point in the history
  • Loading branch information
jakep-allenai committed Mar 6, 2025
1 parent bdc0d75 commit e144200
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion olmocr/bench/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ def parse_markdown_tables(self, md_content: str) -> List[np.ndarray]:
A list of numpy arrays, each representing a parsed table
"""
# Extract all tables from markdown
table_pattern = r'(\|(?:[^|]*\|)+)\s*\n\|(?:[:-]+\|)+\s*\n((?:\|(?:[^|]*\|)+\s*\n)+)'
table_pattern = r'(\|(?:[^|]*\|)+)\s*\n\|(?:[ :-]+\|)+\s*\n((?:\|(?:[^|]*\|)+\s*\n)+)'
table_matches = re.finditer(table_pattern, md_content)

parsed_tables = []
Expand Down

0 comments on commit e144200

Please sign in to comment.