-
Notifications
You must be signed in to change notification settings - Fork 30
fix(file-based): Switch Excel parser from calamine to openpyxl engine (do not merge) #848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
devin-ai-integration
wants to merge
1
commit into
main
Choose a base branch
from
devin/1763074978-excel-parser-openpyxl-fix
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
fix(file-based): Switch Excel parser from calamine to openpyxl engine (do not merge) #848
devin-ai-integration
wants to merge
1
commit into
main
from
devin/1763074978-excel-parser-openpyxl-fix
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Switch the Excel parser engine from calamine to openpyxl to prevent
crashes when parsing Excel files with invalid date values.
The calamine engine (Rust-based) panics when encountering date values
that result in years outside Python's datetime range (1-9999), causing
the entire sync to fail. The openpyxl engine (pure Python) handles
these edge cases more gracefully, allowing syncs to complete even with
data quality issues.
This fixes crashes like:
pyo3_runtime.PanicException: failed to construct date: PyErr {
type: <class 'ValueError'>,
value: ValueError('year 20225 is out of range')
}
Trade-off: openpyxl is slower than calamine, but reliability is more
important than speed for production syncs.
Fixes: airbytehq/oncall#10097
Co-Authored-By: unknown <>
Contributor
Author
Original prompt from API User |
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1763074978-excel-parser-openpyxl-fix#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1763074978-excel-parser-openpyxl-fixHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix(file-based): Switch Excel parser from calamine to openpyxl engine (do not merge)
Requested by: @agarctfi in airbytehq/oncall#10097
Summary
This PR switches the Excel parser engine from
calaminetoopenpyxlto fix crashes when parsing Excel files with invalid date values (e.g., year 20225).file-basedextra in pyproject.tomlThe Problem:
pyo3_runtime.PanicExceptionthat crashes the entire sync, even after processing millions of recordsThe Solution:
Change:
excel_parser.py:engine="calamine"→engine="openpyxl"Dependency Requirements
REQUIRED BEFORE MERGE:
Add openpyxl to the
file-basedextra inpyproject.toml:And add to dependencies section:
Alternative Approach:
Instead of a global switch, consider implementing a fallback mechanism:
This would provide the performance benefits of calamine for normal files while gracefully handling edge cases with openpyxl.
Review & Testing Checklist for Human
Test Plan
Notes