Loading partitioned dataset from S3 to Redshift #1712
Unanswered
dhatch-niv
asked this question in
Q&A
Replies: 1 comment
-
Unfotunately Instead of using wr.redshift.copy_from_files did you consider wr.redshift.copy? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all -
I have a dataset that I am looking to incrementally extract to redshift. I have broken this into two steps:
For each new period of data:
The implementation roughly looks like:
The
copy_from_files
function fails because Redshift does not understand the partitions created byto_parquet
, andwr.s3.to_parquet
strips the partition column from the data files when writing to S3.A possible solution would be to add a keyword argument to
to_parquet
, something likepreserve_partition_columns=True
which would keep the partition columns in the Parquet file instead of dropping them.Is there a better way to achieve this without changes to the library?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions