Feat/stmt2 new bind#35264
Conversation
Add columnar binding support for stmt2 to improve bulk insert performance with column-organized data. Key changes: - Add TAOS_STMT2_COLUMN_BIND and TAOS_STMT2_COLUMN_BINDV structures - Implement taos_stmt2_bind_param_column to convert columnar data to row format - Implement taos_stmt2_bind_param_column_a for async binding - Add comprehensive test coverage with 6 test cases Columnar binding advantages: - Better cache locality for column-organized data sources - SIMD optimization opportunities - Suitable for big data batch inserts Technical details: - First column must be tbname for table identification - Tag columns: num=0 (only first row valid) - Data columns: num=numRows (all rows valid) - Field info caching with (stmt, SQL string pointer) key to avoid multiple get_fields calls which have side effects Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request introduces columnar binding support for the stmt2 interface, including new data structures, API functions, and internal implementation logic to convert columnar data to row format. It also adds a macOS compilation guide, updates the build configuration to include a new test executable, and fixes a bug in the decimal string parsing logic. I have no feedback to provide as there are no review comments to evaluate.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new “columnar bind” API for stmt2 to allow binding data in column-oriented buffers (instead of per-row binds), with supporting prepare-time field caching and async binding support. It also includes a small fix to decimal exponent parsing and adds a macOS build guide.
Changes:
- Add public client API/types for stmt2 columnar binding (
TAOS_STMT2_COLUMN_BIND(V)+taos_stmt2_bind_param_column(_a)), plus implementation that converts columnar input into existing row-bind flow. - Cache prepared field metadata in
stmtPrepare2()to avoid reconstructing placeholder/field info during binds; update async bind thread to support columnar mode. - Add a new client unit-test executable target for
stmt2TestColumn, and add a newBUILD_MACOS.mddocument.
Reviewed changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
source/libs/decimal/src/decimal.c |
Updates exponent parsing for scientific-notation decimals. |
include/client/taos.h |
Adds new public columnar bind structs and APIs for stmt2. |
source/client/inc/clientStmt2.h |
Extends stmt2 internal structs with cached field metadata + async thread args union for row/columnar binds. |
source/client/src/clientStmt2.c |
Frees cached field metadata on cleanup; populates field cache at prepare; updates async bind thread to call columnar bind. |
source/client/src/clientMain.c |
Implements columnar bind API and refactors async bind entrypoint to share code between row/columnar. |
source/client/test/CMakeLists.txt |
Builds stmt2TestColumn as a unit test executable. |
BUILD_MACOS.md |
Adds macOS build instructions. |
.gitignore |
Adds BUILD_MACOS.md to ignore list. |
Comments suppressed due to low confidence (1)
source/client/src/clientMain.c:2935
stmtBindParamAImpl()leaks the allocatedThreadArgs *argson early-return paths (e.g., whenasyncBindNum > 0) and also whentaosStmt2AsyncBind()fails. Ensureargsis freed on all failure/return paths after allocation.
(void)taosThreadMutexLock(&(pStmt->asyncBindParam.mutex));
if (atomic_load_8((int8_t *)&pStmt->asyncBindParam.asyncBindNum) > 0) {
(void)taosThreadMutexUnlock(&(pStmt->asyncBindParam.mutex));
tscError("async bind param is still working, please try again later");
terrno = TSDB_CODE_TSC_STMT_API_ERROR;
return terrno;
}
(void)atomic_add_fetch_8(&pStmt->asyncBindParam.asyncBindNum, 1);
(void)taosThreadMutexUnlock(&(pStmt->asyncBindParam.mutex));
int code_s = taosStmt2AsyncBind(stmtAsyncBindThreadFunc, (void *)args);
if (code_s != TSDB_CODE_SUCCESS) {
terrno = code_s;
(void)taosThreadMutexLock(&(pStmt->asyncBindParam.mutex));
(void)taosThreadCondSignal(&(pStmt->asyncBindParam.waitCond));
(void)atomic_sub_fetch_8(&pStmt->asyncBindParam.asyncBindNum, 1);
(void)taosThreadMutexUnlock(&(pStmt->asyncBindParam.mutex));
tscError("async bind failed, code:%d , %s", code_s, tstrerror(code_s));
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ThreadArgs *args = (ThreadArgs *)taosMemoryMalloc(sizeof(ThreadArgs)); | ||
| args->stmt = stmt; | ||
| args->bindv = bindv; | ||
| args->col_idx = col_idx; | ||
| args->fp = fp; | ||
| args->param = param; | ||
| args->is_columnar = is_columnar; | ||
| if (is_columnar) { |
There was a problem hiding this comment.
stmtBindParamAImpl() allocates args but never checks for allocation failure before dereferencing, which can crash on OOM. Please add a NULL check and return TSDB_CODE_OUT_OF_MEMORY (or terrno) early.
| // Check if variable length type | ||
| bool isVarLength = IS_VAR_DATA_TYPE(colBind->buffer_type) || | ||
| colBind->buffer_type == TSDB_DATA_TYPE_DECIMAL || | ||
| colBind->buffer_type == TSDB_DATA_TYPE_DECIMAL64; | ||
|
|
||
| // Set buffer, length, and is_null with offset | ||
| if (isVarLength) { | ||
| bind->buffer = (char*)colBind->buffer + colOffsets[fieldIdx]; | ||
| bind->length = colBind->length + startRow; | ||
| bind->is_null = colBind->is_null ? (char*)colBind->is_null + startRow : NULL; |
There was a problem hiding this comment.
In processColumnsForTable(), for variable-length types you do pointer arithmetic on colBind->length and colBind->buffer without validating they are non-NULL. If a caller passes a var-length column with length==NULL (or buffer==NULL with nonzero lengths), this will crash before deeper validation runs. Add explicit checks for var-length columns (and tbname) and return an invalid-parameter error instead of dereferencing NULL.
| char *tbname = (char*)bindv->columns[tbnameColIdx].buffer + tbnameOffset; | ||
| int32_t tbname_len = bindv->columns[tbnameColIdx].length[startRow]; | ||
|
|
||
| if (tbname == NULL || tbname_len == 0) { | ||
| STMT2_ELOG("Table name at startRow %d is NULL or empty", startRow); | ||
| terrno = TSDB_CODE_INVALID_PARA; | ||
| goto _return; |
There was a problem hiding this comment.
When grouping super-table rows by tbname, the code ignores the is_null bitmap for the tbname column. If is_null[startRow] is set, the current logic still reads buffer + offset and may mis-group or crash. Consider validating is_null for tbname rows (treating NULL as invalid) before using length/memcmp.
| TARGET_LINK_LIBRARIES( | ||
| stmt2TestColumn PRIVATE | ||
| os util common transport parser catalog scheduler ${TAOS_NATIVE_LIB_STATIC} qcom executor function | ||
| ) |
There was a problem hiding this comment.
stmt2TestColumn is added as a test executable, but it is not registered with add_test() in the Linux test block below, so it won’t run under ctest/CI even though it exercises the new columnar binding API. Please add an add_test(NAME stmt2TestColumn COMMAND stmt2TestColumn) entry (guarded consistently with the other tests).
| ) | |
| ) | |
| if (NOT TD_WINDOWS) | |
| add_test(NAME stmt2TestColumn COMMAND stmt2TestColumn) | |
| endif() |
Description
Issue(s)
Checklist
Please check the items in the checklist if applicable.