Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache BigQuery: Invalid value for type: DECIMAL(38, 9) is not a valid value #573

Open
andreibaragan opened this issue Jan 14, 2025 · 5 comments · May be fixed by #582
Open

Cache BigQuery: Invalid value for type: DECIMAL(38, 9) is not a valid value #573

andreibaragan opened this issue Jan 14, 2025 · 5 comments · May be fixed by #582

Comments

@andreibaragan
Copy link

andreibaragan commented Jan 14, 2025

I am running the latest version (0.22) but I am getting errors when trying to use the BigQuery cache. On source-facebook-marketing, source-quickbooks and source-xero I get the following trace

Traceback (most recent call last): File "/Users/andrei/Work/cedara/pyairbyte/export_quickbooks.py", line 59, in <module> result = source.read(cache=cache) File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/sources/base.py", line 659, in read result = self._read_to_cache( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/sources/base.py", line 743, in _read_to_cache cache._write_airbyte_message_stream( # noqa: SLF001 # Non-public API File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/caches/base.py", line 325, in _write_airbyte_message_stream cache_processor.process_airbyte_messages( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/shared/sql_processor.py", line 315, in process_airbyte_messages self._write_all_stream_data( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/shared/sql_processor.py", line 329, in _write_all_stream_data self.write_stream_data( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/shared/sql_processor.py", line 748, in write_stream_data temp_table_name = self._write_files_to_new_table( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/_processors/sql/bigquery.py", line 178, in _write_files_to_new_table load_job = client.load_table_from_file( # Make an API request File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/google/cloud/bigquery/client.py", line 2601, in load_table_from_file raise exceptions.from_http_response(exc.response) google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/REDACTED/jobs?uploadType=resumable: Invalid value for type: DECIMAL(38, 9) is not a valid value

It's working correctly when I use source-faker (with both users and purchases streams).

Any idea what's going on?

@andreibaragan
Copy link
Author

@aaronsteers I've done a bit more testing and identified that this was broken in 0.17.9, it works fine with 0.17.8 and lower. Hopefully this intel gives some clues to what the problem could be.

@aaronsteers
Copy link
Contributor

@andreibaragan - Thanks for raising! We'll take a look. I'm also adding the accepting_pull_requests label in case anyone else has time to take a look at this.

Copy link
Contributor

@aaronsteers asked me to research this. Here are my findings:

After analyzing the changes between versions 0.17.8 and 0.17.9, I've identified the root cause of the DECIMAL(38, 9) type issue:

  1. The breaking change was introduced in the BigQuery type converter (bigquery.py) as part of the SQLAlchemy 2.0 upgrade in PR Chore: Bump to Sqlalchemy 2.0 #396:

    Before (v0.17.8):

    def to_sql_type(self, sql_type):
        return sql_type.__class__.__name__  # Returns string type names

    After (v0.17.9):

    def to_sql_type(self, sql_type):
        return sql_type  # Returns raw SQLAlchemy type objects
  2. This change affects how decimal types are handled:

    • In v0.17.8, DECIMAL types were converted to string names that BigQuery could understand
    • In v0.17.9, the raw SQLAlchemy DECIMAL type object is passed through without conversion
    • BigQuery expects specific type names for decimal/numeric types, which is why we're seeing the 'Invalid value for type: DECIMAL(38, 9)' error

The issue occurs because BigQuery doesn't directly support SQLAlchemy DECIMAL types in this format - they need to be properly mapped to BigQuery-compatible NUMERIC types.

A fix would involve updating the BigQueryTypeConverter to properly handle DECIMAL types by converting them to BigQuery's NUMERIC type with appropriate precision and scale parameters.

@aaronsteers
Copy link
Contributor

@andreibaragan - I'm asking Devin.ai to follow-up with a PR. While I don't have a lot of time to pick up myself right now, I think the AI may do a decent job, since this seems like a very discreet/testable workflow.

Copy link
Contributor

I have started an attempt to fix here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants