-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
DataFusion version 51.0. However, the bug might not be related to the new release. I haven't tested the code with DF 50.
UPDATE: So I've tried to reproduce it within the DataFusion repository and it seems to work on the newest revision but fail on the 51.0 branch. So I guess some PR already fixed this issue? I don't know which one though.
UPDATE 2: So after bisecting it seems like #17562 fixed the problem and we can close this issue.
Executing the query plan from below causes the following crash:
thread 'test_nested_join_fixed_size_binary' (160312) panicked at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-buffer-57.0.0/src/buffer/immutable.rs:300:9:
the offset of the new Buffer cannot exceed the existing length: slice offset=0 length=8 selflen=0
stack backtrace:
0: __rustc::rust_begin_unwind
at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/std/src/panicking.rs:698:5
1: core::panicking::panic_fmt
at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/panicking.rs:75:14
2: arrow_buffer::buffer::immutable::Buffer::slice_with_length
at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-buffer-57.0.0/src/buffer/immutable.rs:300:9
3: arrow_array::array::fixed_size_binary_array::FixedSizeBinaryArray::slice
at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-57.0.0/src/array/fixed_size_binary_array.rs:224:41
4: <arrow_array::array::fixed_size_binary_array::FixedSizeBinaryArray as arrow_array::array::Array>::slice
at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-array-57.0.0/src/array/fixed_size_binary_array.rs:607:23
5: <arrow_select::coalesce::generic::GenericInProgressArray as arrow_select::coalesce::InProgressArray>::copy_rows
at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.0.0/src/coalesce/generic.rs:58:28
6: arrow_select::coalesce::BatchCoalescer::push_batch
at /home/tschwarzinger/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.0.0/src/coalesce.rs:459:29
7: datafusion_physical_plan::joins::nested_loop_join::NestedLoopJoinStream::process_probe_batch
at /home/tschwarzinger/bartgeier/datafusion/datafusion/physical-plan/src/joins/nested_loop_join.rs:1277:32
8: datafusion_physical_plan::joins::nested_loop_join::NestedLoopJoinStream::handle_probe_right
at /home/tschwarzinger/bartgeier/datafusion/datafusion/physical-plan/src/joins/nested_loop_join.rs:1120:20
9: <datafusion_physical_plan::joins::nested_loop_join::NestedLoopJoinStream as futures_core::stream::Stream>::poll_next
at /home/tschwarzinger/bartgeier/datafusion/datafusion/physical-plan/src/joins/nested_loop_join.rs:931:32
10: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
Interestingly, the crash does not occur if we remove the [None] values from the reproducer. I.e., using
ctx.register_table("t1", fsb_table("left", vec![Some(b"0001")]))?;
ctx.register_table("t2", fsb_table("right", vec![Some(b"0001")]))?;
does not crash.
To Reproduce
use datafusion::arrow::array::{ArrayRef, FixedSizeBinaryArray, RecordBatch};
use datafusion::arrow::datatypes::{DataType, Field, Schema};
use datafusion::datasource::MemTable;
use datafusion::execution::context::SessionContext;
use datafusion::prelude::*;
use std::sync::Arc;
use datafusion_common::DataFusionError;
/// Build a FixedSizeBinary(4) array from byte slices.
fn fsb(values: &[Option<&[u8; 4]>]) -> ArrayRef {
let arr = FixedSizeBinaryArray::from(
values
.iter()
.map(|o| o.map(|x| x.as_slice()))
.collect::<Vec<_>>(),
);
Arc::new(arr)
}
/// Create a MemTable with a single FixedSizeBinary(4) column
fn fsb_table(col_name: &str, data: Vec<Option<&[u8; 4]>>) -> Arc<MemTable> {
let schema = Arc::new(Schema::new(vec![Field::new(
col_name,
DataType::FixedSizeBinary(4),
true,
)]));
let batch = RecordBatch::try_new(schema.clone(), vec![fsb(&data)]).unwrap();
Arc::new(MemTable::try_new(schema, vec![vec![batch]]).unwrap())
}use datafusion::arrow::array::{ArrayRef, FixedSizeBinaryArray, RecordBatch};
use datafusion::arrow::datatypes::{DataType, Field, Schema};
use datafusion::datasource::MemTable;
use datafusion::execution::context::SessionContext;
use datafusion::prelude::*;
use std::sync::Arc;
use datafusion_common::DataFusionError;
/// Build a FixedSizeBinary(4) array from byte slices.
fn fsb(values: &[Option<&[u8; 4]>]) -> ArrayRef {
let arr = FixedSizeBinaryArray::from(
values
.iter()
.map(|o| o.map(|x| x.as_slice()))
.collect::<Vec<_>>(),
);
Arc::new(arr)
}
/// Create a MemTable with a single FixedSizeBinary(4) column
fn fsb_table(col_name: &str, data: Vec<Option<&[u8; 4]>>) -> Arc<MemTable> {
let schema = Arc::new(Schema::new(vec![Field::new(
col_name,
DataType::FixedSizeBinary(4),
true,
)]));
let batch = RecordBatch::try_new(schema.clone(), vec![fsb(&data)]).unwrap();
Arc::new(MemTable::try_new(schema, vec![vec![batch]]).unwrap())
}
#[tokio::test]
async fn test_nested_join_fixed_size_binary() -> Result<(), DataFusionError> {
let ctx = SessionContext::new();
let lhs = vec![Some(b"0001"), None];
let mut rhs = vec![Some(b"0001")];
for _ in 0..5000 {
rhs.push(None)
}
ctx.register_table("t1", fsb_table("left", lhs))?;
ctx.register_table("t2", fsb_table("right", rhs))?;
let df = ctx.table("t1").await?.join(
ctx.table("t2").await?,
JoinType::Left,
&[],
&[],
Some(lit(true)),
)?;
assert_eq!(
df.to_string().await.unwrap(),
"The lhs crashes"
);
Ok(())
}Expected behavior
Not crashing and producing a result.
Additional context
The Query Plan:
+---------------+-----------------------------------------------------+
| plan_type | plan |
+---------------+-----------------------------------------------------+
| logical_plan | Left Join: |
| | TableScan: t1 projection=[left] |
| | TableScan: t2 projection=[right] |
| physical_plan | NestedLoopJoinExec: join_type=Left |
| | DataSourceExec: partitions=1, partition_sizes=[1] |
| | DataSourceExec: partitions=1, partition_sizes=[1] |
| | |
+---------------+-----------------------------------------------------+
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working