Performance of populate function #11

Mad-Apes · 2018-03-20T01:23:30Z

Hello,
Through recent practice, I found out that query performance has greatly improved, but populate function consumes a lot of time.
The populate function will take more than 200 seconds when normal table contains 2000,000 rows.
So, what should I do to make it better?

knizhnik · 2018-03-20T12:51:32Z

In principle, you can populate data in parallel by spawning several concurrent populate statements with different predicates:

select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4', predicate:='column_int2 >= 0 and column_int2 < 1');
select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4', predicate:='column_int2 >= 1 and column_int2 < 2');
...

It will allow to load all CPU cores and perform import up to N times faster.
But please take in account that performing parallel insert will violate sequential order of records in vops_table_gb table which may have negative impact on performance of subsequent select queries.
This was one of the reasons why I have not implemented in VOPS parallel load.

sokolcati closed this as completed Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance of populate function #11

Performance of populate function #11

Mad-Apes commented Mar 20, 2018

knizhnik commented Mar 20, 2018

Uh oh!

Performance of populate function #11

Performance of populate function #11

Comments

Mad-Apes commented Mar 20, 2018

knizhnik commented Mar 20, 2018

Uh oh!