-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Combine Historical and Incremental Data #85
base: main
Are you sure you want to change the base?
Conversation
We encountered an interesting problem: what happens when users want to define a watermark on this table? If we directly apply the ideas from this RFC, because the order of data inserted into the table by batch queries is unordered, unexpected records will likely be expired and deleted by the watermark. I think there are a few possibilities in my thoughts.
|
The 3rd proposal I slightly prefer the 2nd proposal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some new ideas by @st1page from https://risingwave-labs.slack.com/archives/C07CU2YBKCG/p1721184731055789
Since we are adding batch read function risingwavelabs/risingwave#17673, we can combine a batch query with a connector. Then we don't need a batch source and no need to ALTER TABLE
any more.
tentative syntax:
CREATE TABLE orders (
order_id INT,
customer_name VARCHAR,
data JSONB,
PRIMARY KEY (order_id, customer_name)
) INITIAL WITH SELECT * FROM file_scan(
'parquet',
's3',
'ap-southeast-2',
'xxxxxxxxxx',
'yyyyyyyy',
's3://your-bucket/path/to/*'
);
WITH (
connector = 'kinesis',
stream = 'wkx-dynamo-orders',
scan.startup.mode='earliest',
aws.region = 'us-east-1',
kinesis.credentials.access = 'ABCDEFG',
kinesis.credentials.secret = 'abcdefg',
) FORMAT DYNAMODB_CDC ENCODE JSON;
@xiangjinwu : Can be achieved by pause_on_create
+ insert into t select
+ resume
Is it just Here, taking your example, the columns in the table definition and the columns in |
Yes, I asked the same question. 😄 @st1page feels for the specific needs, the syntax CTAS is weird, so he wants to introduce a separated syntax. Specifically,
|
Order is not that important when processing historical data. Particularly, considering multiple parallelism, the order might be less useful to users.
I feel Hmmm, overall, I feel this is not better than the idea of |
LGTM. detailly we need
And we might need to discuss about insert statment on the source later. |
Migrated from Notion.
Preview: https://github.com/risingwavelabs/rfcs/blob/eric/iceberg_source/rfcs/0085-iceberg-source.md