Skip to content

"SUM()" on vops_float4 yeilds different results from HEAP #18

Closed
@tvesely

Description

@tvesely

We were running the example queries from your README, and noticed that heap and vops tables don't necessarily return the same results. When we ran this query, we noticed that the float4 columns didn't match with the equivalent query in heap:

-- Mixed mode: let's Postgres does group by and calculates VOPS aggregates for each group
select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
    sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    vops_lineitem_projection
where
    l_shipdate <= '1998-12-01'::date
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;

In a simplified example:

CREATE TABLE
pivotal=# create table vops_foo(a vops_float4);
CREATE TABLE
pivotal=# insert into foo select (i+.1258) from generate_series(1, 1000000)i;
INSERT 0 1000000
pivotal=# select populate(destination := 'vops_foo'::regclass, source := 'foo'::regclass, sort := 'a');
 populate
----------
  1000000
(1 row)

When we run sum(a) on foo, it gets a number that is significantly different than a sum on vops_foo:

pivotal=# select sum(a) from foo;
      sum
---------------
 5.0000986e+11
(1 row)
pivotal=# select sum(a) from vops_foo;
        sum
-------------------
 500000625015.1876
(1 row)

Why does this happen? Is vops_float4 incompatible with the standard float4? Is this a bug?

Activity

knizhnik

knizhnik commented on Aug 17, 2019

@knizhnik
Contributor

VOPS is calculating aggregates on float4/float8 ytpe using double,
while Postgres is accumulating result using numeric type.

Please also notice that result of sum of floating point values may depend on order of calculation.
You can see in in vanilla Postgres:

postgres=# select sum(x) from ff;
sum

499914.6792653507
(1 row)

postgres=# select sum(x) from ff;
sum

499914.6792653515
(1 row)

postgres=# select sum(x) from ff;
sum

499914.67926534865
(1 row)

You can notice small differences in results - it is because using of parallel plan for execution of this query:

postgres=# explain select sum(x) from ff;
QUERY PLAN

Finalize Aggregate (cost=11614.55..11614.56 rows=1 width=8)
-> Gather (cost=11614.33..11614.54 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=10614.33..10614.34 rows=1 width=8)
-> Parallel Seq Scan on ff (cost=0.00..9572.67 rows=416667 width=8)
(5 rows)

So some small difference in the results is not an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @knizhnik@tvesely

        Issue actions

          "SUM()" on vops_float4 yeilds different results from HEAP · Issue #18 · postgrespro/vops