How to optimize reading of many versions of delta table #3807
Unanswered
processadd
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I have a raw delta table
raw
loaded from Kinesis by a structured streaming job, and every several seconds there is a new version created in raw, appended from the structured streaming job.I also have a structured streaming job with trigger=availableNow to read from
raw
and it is triggered daily. So each trigger might see thousands of versions fromraw
. I use foreachBatch to do MERGE on a target delta table.The MERGE seems fast (less than 1 min) but it shows a long time on duraionMs like
Why it needs so long time for latestOffset and addBatch?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions