Skip to content

Error on load data in VOPS table:row is too big #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Mad-Apes opened this issue Mar 12, 2018 · 10 comments
Closed

Error on load data in VOPS table:row is too big #9

Mad-Apes opened this issue Mar 12, 2018 · 10 comments

Comments

@Mad-Apes
Copy link

Mad-Apes commented Mar 12, 2018

Error on load data in VOPS table:row is too big

Hello,
When I load data in VOPS table using VOPS populate(...) function and VOPS import(...) function,An error occurred :
VOPS=# select populate(destination:='vops_table'::regclass, source:='std_table'::regclass);
ERROR: row is too big: size 16392, maximum size 8160

VOPS=# select import(destination := 'vops_table'::regclass, csv_path := '/data/vops_data/std_table.csv', separator := '|');
ERROR: row is too big: size 16392, maximum size 8160 CONTEXT: IMPORT vops_table, line 65, column 63

'std_table' is a normal table which contains 64 columns.The type of all columns are int(int2、int4、int8) or float(float4、float8).

I tried to solve the problem,but the methods that I tried did not work.

So how to solve this problem ?.

best regards, thank you.

@knizhnik
Copy link
Contributor

VOPS types are declared with PLAIN storage type, so Postgres is not able to move this data to TOAST storage. I can certainly change storage type to EXTENDED, but it will require all serialization routes for this type (because them now will have varying length rather the fixed length representation). But what is more important, moving data to some other (TOAST) page will significantly slow down access time and even somehow devalue idea of VOPS with vector processing.

So you really need all 64 columns in your query? If not, then my recommendation is to create several different projections with smaller subset of attributes. It can also significantly increase speed of queries involving just small subset of columns.

Yes another solution is increasing Postgres page size (form default 8kb). It will require rebuilding of Postgres from sources and most likely you will not like to do it.

And if none of the suggested alternatives are applicable in you case, I can provide you version of VOPS where VOPS types are defined with EXTENDED storage type so that you can check performance in this case.

@Mad-Apes
Copy link
Author

Thanks for the reply.
We have 100+ normal tables, and each table contains tens of columns. Most of the columns will be used in analytic queries.

According to your suggestion. I tried to split a large VOPS table into several small VOPS tables with smaller subset of attributes and I use VOPS FDW. I created FDW tables by the small VOPS tables which contains smaller subset of attributes. load data in VOPS tables, The error did't occurred any more. Then analyze for FDW tables. I execute query on every VOPS table,and combine each result into one by 'inner join'. It works.

There is a problem that bothers me. There are more then 500,000 rows in the table. The execution time of query with VOPS is about the same as the normal one. Maybe I did something wrong in somewhere.

My steps:

  1. create VOPS tables with smaller subset of attributes.
  2. load data by VOPS populate function
  3. create FDW tables by VOPS tables
  4. analyze FDW tables
  5. execute query
    Here is the query:
    SELECT *
    FROM(
    SELECT
    MAX(fdw_table_1.id) AS id
    .
    .
    .
    ,MIN(fdw_table_1.time) AS time
    ,AVG(fdw_table_1.received_num) AS enb_received_power
    .
    .
    .
    ,MAX(fdw_table_1.sequence_id) AS sequence_id
    ,count(*) AS xdr_count
    ,cast(null as bigint) AS align
    FROM vops_fdw_table_1 fdw_table_1
    GROUP BY fdw_table_1.id, fdw_table_1.align
    ) as fdw_1
    inner join (
    SELECT
    MAX(fdw_table_2.sequence_id)
    .
    .
    .
    FROM vops_fdw_table_2 fdw_table_2
    GROUP BY fdw_table_2.id, fdw_table_2.align
    ) fdw_2
    on fdw_1.sequence_id = fdw_2.sequence_id
    If there are wrong, please point them out. Thank you very much.

I would be very grateful if you provide the version of VOPS where VOPS types are defined with EXTENDED storage type.

@knizhnik
Copy link
Contributor

If you always perform grouping by sequence_id, then it is desirable to leave sequence_id as scalar type (do not replace it with VOPS type). In this case grouping will be done by Postgres is standard way while aggregates will be calculated using VOPS functions. Otherwise VOPS FDW will have to extract horizontal tuples from VOPS types, after which grouping and aggregation will be done using standard Postgres executor nodes. So there will be no performance benefit except extra overhead of unnesting VOPS tiles into horizontal tuples.

Also I do not completely understand idea of grouping by dummy align field which actually represents null value.

@knizhnik
Copy link
Contributor

Please find version of the VOPS where VOPS types have extended storage class in "extended" branch in GIT.

@Mad-Apes
Copy link
Author

First of all, align is the field in table vops_fdw_table_1 and vops_fdw_table_2. The file in this two tables is not null. During the statistical process, I need a bigint data values null, and I give it an alias named align. I sincerely apologize to you.

OK,Next, I will do my work in these two ways and compare the performance between them.

Thanks again

@Mad-Apes
Copy link
Author

Hello,

I met a new problem. data lost after performing populate function.

  1. VOPS=# SELECT column_int2, column_int4, column_int8 FROM std_table WHERE column_int2 = 1 AND column_int4 = 0;

column_int2 | column_int4 | column_int8
-------------+-------------+-------------
1 | 0 | 762
1 | 0 | 722
1 | 0 | 497
1 | 0 | 4
1 | 0 | 472
1 | 0 | 312
1 | 0 | 504
1 | 0 | 808
(8 rows)

  1. select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4');

3.VOPS=# SELECT column_int2, column_int4, column_int8 FROM vops_table_gb WHERE column_int2 = 1 AND column_int4 = 0;

column_int2 | column_int4 | column_int8
-------------+-------------+----------------------------------------------------------------------------
1 | 0 | {504,497,762,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
(1 row)

'column_int2' and 'column_int4' are scalar type in vops_table_gb .

@knizhnik
Copy link
Contributor

knizhnik commented Mar 16, 2018

Sorry, I can not reproduce the problem:

postgres=# create table std_table(column_int2 int2, column_int4 int4, column_int8 int8);
CREATE TABLE
postgres=# insert into std_table values (1,0,762),(1,0,722),(1,0,497),(1,0,4),(1,0,472),(1,0,312),(1,0,504),(1,0,808);
INSERT 0 8
postgres=# SELECT column_int2, column_int4, column_int8 FROM std_table WHERE column_int2 = 1 AND column_int4 = 0;
 column_int2 | column_int4 | column_int8 
-------------+-------------+-------------
           1 |           0 |         762
           1 |           0 |         722
           1 |           0 |         497
           1 |           0 |           4
           1 |           0 |         472
           1 |           0 |         312
           1 |           0 |         504
           1 |           0 |         808
(8 rows)

postgres=# create table vops_table_gb(column_int2 int2, column_int4 int4, column_int8 vops_int8);
CREATE TABLE
postgres=#     select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4');
 populate 
----------
        8
(1 row)

postgres=# SELECT column_int2, column_int4, column_int8 FROM vops_table_gb WHERE column_int2 = 1 AND column_int4 = 0;
 column_int2 | column_int4 |                                       column_int8 
                                      
-------------+-------------+---------------------------------------------------
--------------------------------------
           1 |           0 | {762,722,497,4,472,312,504,808,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
(1 row)

Can you send mew some example, reproducing the problem?

@Mad-Apes
Copy link
Author

Hello,Here is what I did:

1:
VOPS=# create table std_table(pk_column int2,column_int2 int2, column_int4 int4, column_int8 int8); CREATE TABLE
2:
VOPS=# INSERT INTO std_table VALUES(generate_series(1,10),random()*5,random()*5,random()*100);
INSERT 0 10
3:
VOPS=# create table vops_table_gb(pk_column vops_int2,column_int2 int2, column_int4 int4, column_int8 vops_int8);
CREATE TABLE
4:
VOPS=# select populate(destination:='vops_table_gb'::regclass, source:='std_table'::regclass, sort := 'column_int2,column_int4');
populate
----------
10
(1 row)
5:
VOPS=# SELECT column_int2 ,column_int4 ,sum(column_int8) FROM std_table GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4;
column_int2 | column_int4 | sum
-------------+-------------+-----
0 | 1 | 81
1 | 1 | 69
2 | 0 | 45
2 | 5 | 166
3 | 0 | 26
3 | 2 | 94
4 | 1 | 49
5 | 2 | 46
(8 rows)
6:
VOPS=# SELECT column_int2 ,column_int4 ,sum(column_int8) FROM vops_table_gb GROUP BY column_int2,column_int4 ORDER BY column_int2,column_int4;
column_int2 | column_int4 | sum
-------------+-------------+-----
0 | 1 | 81
1 | 1 | 69
2 | 0 | 45
2 | 5 | 40
3 | 0 | 26
3 | 2 | 94
4 | 1 | 49
5 | 2 | 46
(8 rows)

Look, The Result is different where column_int2=2 and column_int4=5.
So, I check the data from each table.

1:
VOPS=# SELECT * FROM std_table where column_int2=2 and column_int4=5;
pk_column | column_int2 | column_int4 | column_int8
-----------+-------------+-------------+-------------
4 | 2 | 5 | 73
6 | 2 | 5 | 53
10 | 2 | 5 | 40
(3 rows)
2:
VOPS=# SELECT * FROM vops_table_gb where column_int2=2 and column_int4=5;
pk_column | column_int2 | column_int4 | column_int8
---------------------------------------------------------------------+-------------+-------------+---------------------------------------------------------------------
{10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} | 2 | 5 | {40,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
(1 row)

@Mad-Apes
Copy link
Author

If you haven’t reproduced the problem. You can try generate more rows to insert into std_table.

@knizhnik
Copy link
Contributor

Thank you very much for helping me to reproduce and investigate the problem.
It is really bug in handling duplicate values in vops_populate.
Fixed in commits e91c75d and 6728123.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants