Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] s3://paas-flink-prod/.../bucket-0/data-5da975ee-318e-4ba4-b3f7-ad112dae5247-0.parquet is not a Parquet file. Expected magic number at tail, but found [21, 0, 21, -32] #5081

Open
1 of 2 tasks
logicbaby opened this issue Feb 13, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@logicbaby
Copy link

logicbaby commented Feb 13, 2025

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

paimon-flink-1.20-1.0.1.jar
paimon-s3-1.0.1.jar
paimon-flink-action-1.0.1.jar

Compute Engine

flink-1.20.0

Minimal reproduce step

Use mysql cdc sync table to paimon table which on s3. it cannot complet checkpoint, taskmanager report:

Caused by: java.lang.RuntimeException: s3://paas-flink-prod/flink-paimon/wh/chen.db/department/bucket-0/data-65dbb220-7017-468d-affb-1de9dd6e4105-0.parquet is not a Parquet file. Expected magic number at tail, but found [21, 0, 21, -32]
	at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:162) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.shade.org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:243) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.format.parquet.ParquetUtil.getParquetReader(ParquetUtil.java:85) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.format.parquet.ParquetUtil.extractColumnStats(ParquetUtil.java:52) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.format.parquet.ParquetSimpleStatsExtractor.extractWithFileInfo(ParquetSimpleStatsExtractor.java:78) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.format.parquet.ParquetSimpleStatsExtractor.extract(ParquetSimpleStatsExtractor.java:71) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.StatsCollectingSingleFileWriter.fieldStats(StatsCollectingSingleFileWriter.java:105) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.KeyValueDataFileWriter.result(KeyValueDataFileWriter.java:169) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.KeyValueDataFileWriter.result(KeyValueDataFileWriter.java:58) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.RollingFileWriter.closeCurrentWriter(RollingFileWriter.java:135) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.io.RollingFileWriter.close(RollingFileWriter.java:167) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.mergetree.MergeTreeWriter.flushWriteBuffer(MergeTreeWriter.java:235) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]
	at org.apache.paimon.mergetree.MergeTreeWriter.prepareCommit(MergeTreeWriter.java:264) ~[paimon-flink-1.20-1.0.1.jar:1.0.1]

I have downloaded this parquet and checked it is ok.

cdc params:

local:///opt/flink/usrlib/paimon-flink-action-1.0.1.jar
mysql_sync_table
--warehouse s3://paas-flink-prod/flink-paimon/wh
--database chen
--table department
--mysql_conf hostname=rm-xxx.mysql.rds.aliyuncs.com
--mysql_conf username=**
--mysql_conf password='**'
--mysql_conf database-name='xxx'
--mysql_conf table-name='department'

What doesn't meet your expectations?

it's cannot use s3 as paimon warehouse backend storage, hdfs is ok.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@logicbaby logicbaby added the bug Something isn't working label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant