Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird query behavior when using CTE without materialized hint #4499

Open
dongho-jung opened this issue Mar 12, 2025 · 1 comment
Open

Weird query behavior when using CTE without materialized hint #4499

dongho-jung opened this issue Mar 12, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@dongho-jung
Copy link

dongho-jung commented Mar 12, 2025

Describe the bug
I’m not exactly sure what the issue is, but there is a problem where the query results come out strangely when using a CTE without materialized hint.

For example, when explain a query like the following: (The query is quite long, and you don’t need to look at it. Ultimately, what I want to show is the difference in explain results depending on whether the materialized hint is used or not.)

explain with parsed_data as (
    select
        d.namespace,
        d.name,
        d.replicas,
        pdb.min_available,
        pdb.max_unavailable,
        case
            when pdb.min_available is not null then
                regexp_replace(pdb.min_available::text, '.*IntVal:([0-9]*).*', '\1')::int
            else null
        end as min_available_int_val,
        case
            when pdb.min_available is not null and
                 regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
                round(regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
            else null
        end as min_available_str_val,
        case
            when pdb.max_unavailable is not null then
                regexp_replace(pdb.max_unavailable::text, '.*IntVal:([0-9]*).*', '\1')::int
            else null
        end as max_unavailable_int_val,
        case
            when pdb.max_unavailable is not null and
                 regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
                round(regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
            else null
        end as max_unavailable_str_val
    from
        k8s_main.kubernetes_pod_disruption_budget pdb
        inner join k8s_main.kubernetes_deployment d on d.selector = pdb.selector
        and d.namespace = pdb.namespace
)
select
    namespace,
    name,
    min_available,
    max_unavailable,
    replicas,
    min_available_int_val
from
    parsed_data
where
    min_available_int_val >= replicas
order by
    namespace,
    name;

explain results are as follows:

+-----------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------+
|Sort  (cost=40040003898208.71..40040003919042.05 rows=8333333 width=141)                                               |
|  Sort Key: d.namespace, d.name                                                                                        |
|  ->  Nested Loop  (cost=0.00..40040001145833.33 rows=8333333 width=141)                                               |
|        ->  Foreign Scan on kubernetes_pod_disruption_budget pdb  (cost=0.00..40000000000000.00 rows=1000000 width=400)|
|        ->  Foreign Scan on kubernetes_deployment d  (cost=0.00..40000.00 rows=100 width=400)                          |
|              Filter: ((selector = pdb.selector) AND (namespace = pdb.namespace))                                      |
+-----------------------------------------------------------------------------------------------------------------------+

As you can see, the explain plan does not include any condition related to min_available_int_val >= replicas. So, when executing the query, this where clause is not applied at all, resulting in all rows being returned.

If I add the materialized hint, everything works correctly

explain with parsed_data as materialized (
    select
        d.namespace,
        d.name,
        d.replicas,
        pdb.min_available,
        pdb.max_unavailable,
        case
            when pdb.min_available is not null then
                regexp_replace(pdb.min_available::text, '.*IntVal:([0-9]*).*', '\1')::int
            else null
        end as min_available_int_val,
        case
            when pdb.min_available is not null and
                 regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
                round(regexp_replace(pdb.min_available::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
            else null
        end as min_available_str_val,
        case
            when pdb.max_unavailable is not null then
                regexp_replace(pdb.max_unavailable::text, '.*IntVal:([0-9]*).*', '\1')::int
            else null
        end as max_unavailable_int_val,
        case
            when pdb.max_unavailable is not null and
                 regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1') != '' then
                round(regexp_replace(pdb.max_unavailable::text, '.*StrVal:([0-9.]*)%?.*', '\1')::numeric / 100, 2)
            else null
        end as max_unavailable_str_val
    from
        k8s_main.kubernetes_pod_disruption_budget pdb
        inner join k8s_main.kubernetes_deployment d on d.selector = pdb.selector
        and d.namespace = pdb.namespace
)
select
    namespace,
    name,
    min_available,
    max_unavailable,
    replicas,
    min_available_int_val
from
    parsed_data
where
    min_available_int_val >= replicas
order by
    namespace,
    name;

->

+-------------------------------------------------------------------------------------------------------------------------+
|QUERY PLAN                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------+
|Sort  (cost=40040005585708.72..40040005606542.05 rows=8333333 width=141)                                                 |
|  Sort Key: parsed_data.namespace, parsed_data.name                                                                      |
|  CTE parsed_data                                                                                                        |
|    ->  Nested Loop  (cost=0.00..40040002250000.00 rows=25000000 width=208)                                              |
|          ->  Foreign Scan on kubernetes_pod_disruption_budget pdb  (cost=0.00..40000000000000.00 rows=1000000 width=400)|
|          ->  Foreign Scan on kubernetes_deployment d  (cost=0.00..40000.00 rows=100 width=400)                          |
|                Filter: ((selector = pdb.selector) AND (namespace = pdb.namespace))                                      |
|  ->  CTE Scan on parsed_data  (cost=0.00..583333.33 rows=8333333 width=141)                                             |
|        Filter: (min_available_int_val >= replicas)                                                                      |
+-------------------------------------------------------------------------------------------------------------------------+

The only difference is the materialized hint at the top, but when checking the plan, we can see that Filter: (min_available_int_val >= replicas) is properly included, and the query executes as intended.

In fact, besides this example, I’ve encountered similar cases a few times before. Each time, I just added the materialized hint to resolve the issue.

I looked it up and found a few similar cases:

If this is a known issue and adding the materialized hint is the appropriate workaround, it would be helpful to explicitly mention it in the documentation—so others don’t have to struggle with the same issue like I did.

On the other hand, if this behavior is unintended and a clear bug, then it should be investigated and fixed at the root cause.

Steampipe version (steampipe -v)
v1.0.3

To reproduce
I haven’t been able to find a minimal, reproducible example. If you have the k8s plugin, please refer to [Describe the bug].

Expected behavior
"Ensuring that the where clause is not omitted even without the materialized hint” or “Documenting this issue with a workaround as guidance.”

Additional context
N/A

@dongho-jung dongho-jung added the bug Something isn't working label Mar 12, 2025
@dongho-jung dongho-jung changed the title Weird query behavior when using CTE without materialized hint [WIP] Weird query behavior when using CTE without materialized hint Mar 12, 2025
@dongho-jung dongho-jung changed the title [WIP] Weird query behavior when using CTE without materialized hint Weird query behavior when using CTE without materialized hint Mar 12, 2025
@pskrbasu
Copy link
Contributor

pskrbasu commented Mar 12, 2025

Thank you, @dongho-jung, for taking the time to raise this detailed issue. The issues you mentioned are similar, but I'm unsure if they are the same. I will look into this and update this thread.

Meanwhile if you can find a minimal reproducible example, let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants