Skip to content

Performance of populate function #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Mad-Apes opened this issue Mar 20, 2018 · 1 comment
Closed

Performance of populate function #11

Mad-Apes opened this issue Mar 20, 2018 · 1 comment

Comments

@Mad-Apes
Copy link

Hello,
Through recent practice, I found out that query performance has greatly improved, but populate function consumes a lot of time.
The populate function will take more than 200 seconds when normal table contains 2000,000 rows.
So, what should I do to make it better?

@knizhnik
Copy link
Contributor

In principle, you can populate data in parallel by spawning several concurrent populate statements with different predicates:

select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4', predicate:='column_int2 >= 0 and column_int2 < 1');
select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4', predicate:='column_int2 >= 1 and column_int2 < 2');
...

It will allow to load all CPU cores and perform import up to N times faster.
But please take in account that performing parallel insert will violate sequential order of records in vops_table_gb table which may have negative impact on performance of subsequent select queries.
This was one of the reasons why I have not implemented in VOPS parallel load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants