Closed
Description
We were running the example queries from your README, and noticed that heap and vops tables don't necessarily return the same results. When we ran this query, we noticed that the float4
columns didn't match with the equivalent query in heap:
-- Mixed mode: let's Postgres does group by and calculates VOPS aggregates for each group
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
vops_lineitem_projection
where
l_shipdate <= '1998-12-01'::date
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
In a simplified example:
CREATE TABLE
pivotal=# create table vops_foo(a vops_float4);
CREATE TABLE
pivotal=# insert into foo select (i+.1258) from generate_series(1, 1000000)i;
INSERT 0 1000000
pivotal=# select populate(destination := 'vops_foo'::regclass, source := 'foo'::regclass, sort := 'a');
populate
----------
1000000
(1 row)
When we run sum(a)
on foo, it gets a number that is significantly different than a sum on vops_foo:
pivotal=# select sum(a) from foo;
sum
---------------
5.0000986e+11
(1 row)
pivotal=# select sum(a) from vops_foo;
sum
-------------------
500000625015.1876
(1 row)
Why does this happen? Is vops_float4
incompatible with the standard float4? Is this a bug?
Activity
knizhnik commentedon Aug 17, 2019
VOPS is calculating aggregates on float4/float8 ytpe using double,
while Postgres is accumulating result using numeric type.
Please also notice that result of sum of floating point values may depend on order of calculation.
You can see in in vanilla Postgres:
postgres=# select sum(x) from ff;
sum
499914.6792653507
(1 row)
postgres=# select sum(x) from ff;
sum
499914.6792653515
(1 row)
postgres=# select sum(x) from ff;
sum
499914.67926534865
(1 row)
You can notice small differences in results - it is because using of parallel plan for execution of this query:
postgres=# explain select sum(x) from ff;
QUERY PLAN
Finalize Aggregate (cost=11614.55..11614.56 rows=1 width=8)
-> Gather (cost=11614.33..11614.54 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=10614.33..10614.34 rows=1 width=8)
-> Parallel Seq Scan on ff (cost=0.00..9572.67 rows=416667 width=8)
(5 rows)
So some small difference in the results is not an error.