Skip to content

Panic in NestedLoopJoin with Nullable FixedSizeBinary Arrays #18870

@tobixdev

Description

@tobixdev

Describe the bug

DataFusion version 51.0. However, the bug might not be related to the new release. I haven't tested the code with DF 50.

UPDATE: So I've tried to reproduce it within the DataFusion repository and it seems to work on the newest revision but fail on the 51.0 branch. So I guess some PR already fixed this issue? I don't know which one though.

UPDATE 2: So after bisecting it seems like #17562 fixed the problem and we can close this issue.

Executing the query plan from below causes the following crash:

thread 'test_nested_join_fixed_size_binary' (160312) panicked at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-buffer-57.0.0/src/buffer/immutable.rs:300:9:
the offset of the new Buffer cannot exceed the existing length: slice offset=0 length=8 selflen=0
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/std/src/panicking.rs:698:5
   1: core::panicking::panic_fmt
             at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/panicking.rs:75:14
   2: arrow_buffer::buffer::immutable::Buffer::slice_with_length
             at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-buffer-57.0.0/src/buffer/immutable.rs:300:9
   3: arrow_array::array::fixed_size_binary_array::FixedSizeBinaryArray::slice
             at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-57.0.0/src/array/fixed_size_binary_array.rs:224:41
   4: <arrow_array::array::fixed_size_binary_array::FixedSizeBinaryArray as arrow_array::array::Array>::slice
             at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-57.0.0/src/array/fixed_size_binary_array.rs:607:23
   5: <arrow_select::coalesce::generic::GenericInProgressArray as arrow_select::coalesce::InProgressArray>::copy_rows
             at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.0.0/src/coalesce/generic.rs:58:28
   6: arrow_select::coalesce::BatchCoalescer::push_batch
             at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.0.0/src/coalesce.rs:459:29
   7: datafusion_physical_plan::joins::nested_loop_join::NestedLoopJoinStream::process_probe_batch
             at /home/tschwarzinger/bartgeier/datafusion/datafusion/physical-plan/src/joins/nested_loop_join.rs:1277:32
   8: datafusion_physical_plan::joins::nested_loop_join::NestedLoopJoinStream::handle_probe_right
             at /home/tschwarzinger/bartgeier/datafusion/datafusion/physical-plan/src/joins/nested_loop_join.rs:1120:20
   9: <datafusion_physical_plan::joins::nested_loop_join::NestedLoopJoinStream as futures_core::stream::Stream>::poll_next
             at /home/tschwarzinger/bartgeier/datafusion/datafusion/physical-plan/src/joins/nested_loop_join.rs:931:32
  10: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next

Interestingly, the crash does not occur if we remove the [None] values from the reproducer. I.e., using

ctx.register_table("t1", fsb_table("left", vec![Some(b"0001")]))?;
ctx.register_table("t2", fsb_table("right", vec![Some(b"0001")]))?;

does not crash.

To Reproduce

use datafusion::arrow::array::{ArrayRef, FixedSizeBinaryArray, RecordBatch};
use datafusion::arrow::datatypes::{DataType, Field, Schema};
use datafusion::datasource::MemTable;
use datafusion::execution::context::SessionContext;
use datafusion::prelude::*;
use std::sync::Arc;
use datafusion_common::DataFusionError;

/// Build a FixedSizeBinary(4) array from byte slices.
fn fsb(values: &[Option<&[u8; 4]>]) -> ArrayRef {
    let arr = FixedSizeBinaryArray::from(
        values
            .iter()
            .map(|o| o.map(|x| x.as_slice()))
            .collect::<Vec<_>>(),
    );
    Arc::new(arr)
}

/// Create a MemTable with a single FixedSizeBinary(4) column
fn fsb_table(col_name: &str, data: Vec<Option<&[u8; 4]>>) -> Arc<MemTable> {
    let schema = Arc::new(Schema::new(vec![Field::new(
        col_name,
        DataType::FixedSizeBinary(4),
        true,
    )]));

    let batch = RecordBatch::try_new(schema.clone(), vec![fsb(&data)]).unwrap();

    Arc::new(MemTable::try_new(schema, vec![vec![batch]]).unwrap())
}use datafusion::arrow::array::{ArrayRef, FixedSizeBinaryArray, RecordBatch};
use datafusion::arrow::datatypes::{DataType, Field, Schema};
use datafusion::datasource::MemTable;
use datafusion::execution::context::SessionContext;
use datafusion::prelude::*;
use std::sync::Arc;
use datafusion_common::DataFusionError;

/// Build a FixedSizeBinary(4) array from byte slices.
fn fsb(values: &[Option<&[u8; 4]>]) -> ArrayRef {
    let arr = FixedSizeBinaryArray::from(
        values
            .iter()
            .map(|o| o.map(|x| x.as_slice()))
            .collect::<Vec<_>>(),
    );
    Arc::new(arr)
}

/// Create a MemTable with a single FixedSizeBinary(4) column
fn fsb_table(col_name: &str, data: Vec<Option<&[u8; 4]>>) -> Arc<MemTable> {
    let schema = Arc::new(Schema::new(vec![Field::new(
        col_name,
        DataType::FixedSizeBinary(4),
        true,
    )]));

    let batch = RecordBatch::try_new(schema.clone(), vec![fsb(&data)]).unwrap();

    Arc::new(MemTable::try_new(schema, vec![vec![batch]]).unwrap())
}

#[tokio::test]
async fn test_nested_join_fixed_size_binary() -> Result<(), DataFusionError> {
    let ctx = SessionContext::new();

    let lhs = vec![Some(b"0001"), None];
    let mut rhs = vec![Some(b"0001")];
    for _ in 0..5000 {
        rhs.push(None)
    }

    ctx.register_table("t1", fsb_table("left", lhs))?;
    ctx.register_table("t2", fsb_table("right", rhs))?;

    let df = ctx.table("t1").await?.join(
        ctx.table("t2").await?,
        JoinType::Left,
        &[],
        &[],
        Some(lit(true)),
    )?;

    assert_eq!(
        df.to_string().await.unwrap(),
        "The lhs crashes"
    );

    Ok(())
}

Expected behavior

Not crashing and producing a result.

Additional context

The Query Plan:

+---------------+-----------------------------------------------------+
| plan_type     | plan                                                |
+---------------+-----------------------------------------------------+
| logical_plan  | Left Join:                                          |
|               |   TableScan: t1 projection=[left]                   |
|               |   TableScan: t2 projection=[right]                  |
| physical_plan | NestedLoopJoinExec: join_type=Left                  |
|               |   DataSourceExec: partitions=1, partition_sizes=[1] |
|               |   DataSourceExec: partitions=1, partition_sizes=[1] |
|               |                                                     |
+---------------+-----------------------------------------------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions