[Bug]: SEGV null pointer dereference in qWorkerInit (qworker.c:1481) during vnode startup on 3.4.0.9

**Bug Description**

`taosd` crashes with SIGSEGV (null pointer dereference at address 0x0) in `qWorkerInit` (`qworker.c:1481`) when a vnode-query thread starts up. The crash is deterministic — all 4 consecutive restarts hit the exact same binary offset. After the 4th crash, systemd triggered `start-limit-hit` and prevented further restarts until manual intervention.

**To Reproduce**

The crash occurs during vnode startup/restart under the following conditions:

1. 3-node TDengine cluster (community edition 3.4.0.9) running on Ubuntu 24.04
2. After WAL replay completes on node s3, the vnode-query thread calls `qWorkerInit` to initialize the query worker
3. Inside `qWorkerInit`, an internal resource allocation (hash table, timer, or memory) fails
4. The error-handling path triggers a null pointer dereference, causing SEGV
5. Process crashes, systemd auto-restarts, hits the same bug 4 times, then `start-limit-hit` prevents further restart

The crash happened at 09:21:59, approximately 2 seconds after the last WAL commit at 09:21:57. A manual restart at 14:15 succeeded, indicating this is a timing/resource contention issue rather than persistent data corruption.

**Expected Behavior**

`qWorkerInit` should handle resource allocation failures gracefully — return an error code without crashing. The vnode should either retry initialization or report the error to the cluster, not segfault.

**Crash Analysis**

System journal (`journalctl -u taosd`) shows 4 identical crashes:

```
May 05 09:21:59 s3 taosd[24999]: taosd: qworker.c:1481: qWorkerInit: Assertion `(0) >= (0)' failed.
May 05 09:21:59 s3 kernel: taosd[24999]: segfault at 0 ip 0000564f7c98eff5 sp 00007f3c68f96940 error 4 in taosd[564f7c300000+1ac0000]
May 05 09:21:59 s3 kernel: Code: 00 00 00 00 00 00 00 00 00 e8 f6 0a da 00 <c7> 45 cc 0f 07 00 80 83 7d cc 00 74 0f b8 00 00 00 00 e8 aa 52 d9 00 8b 55 cc 89 10
```

`addr2line` resolution of the crash IP `0x68eff5`:

```
$ addr2line -e /usr/local/taos/bin/taosd -f 0x68eff5
qWorkerInit
/path/to/source/libs/qworker/src/qworker.c:1481
```

All 4 crashes resolve to the exact same offset `0x68eff5` within the `taosd` binary.

Apport captured 4 core dumps (~800MB each, ~3.2GB total) at `/var/lib/apport/coredump/`.

**Root Cause Analysis (source code level)**

Two bugs in `qWorkerInit` (`libs/qworker/src/qworker.c`) contribute to the crash:

### Bug 1: Missing `terrno` initialization on allocation failure

When `taosHashInit()` or `taosTmrInit()` returns NULL due to internal memory allocation failure, `terrno` is **not set** by these functions in all code paths. Subsequently, `QW_ERR_JRET(terrno)` reads a stale/uninitialized error value.

More critically, `QW_RET(terrno)` expands to:

```c
#define QW_RET(c)                     \
  do {                                \
    int32_t _code = (c);              \
    if (_code != TSDB_CODE_SUCCESS) { \
      terrno = _code;                 \  // expands to: (*taosGetErrno()) = _code
    }                                 \
    return _code;                     \
  } while (0)
```

where `terrno` is `(*taosGetErrno())` and `taosGetErrno()` returns `&tsErrno` (a `__thread` TLS variable). If the vnode-query thread's TLS is not properly initialized at this early stage, writing to this address causes the SEGV.

### Bug 2: Use-after-free in `schHash` NULL check path (line 1510-1511)

```c
if (NULL == mgmt->schHash) {
    taosMemoryFreeClear(mgmt);                              // frees mgmt, sets to NULL
    qError("init %d scheduler hash failed", mgmt->cfg.maxSchedulerNum);  // dereferences NULL!
    QW_ERR_JRET(terrno);
}
```

`taosMemoryFreeClear(mgmt)` frees and NULLs the pointer, then `qError` immediately dereferences `mgmt->cfg.maxSchedulerNum` — a classic NULL pointer dereference after free.

**Environment (please complete the following information):**
 - OS: Ubuntu 24.04 LTS (Linux 6.8.0-107-generic, x86_64)
 - Memory: 7.8 GB RAM, 4 vCPU
 - Disk: 142 GB (46% used)
 - TDengine Version: 3.4.0.9.community (git: ed90f1452e4102043de7804abd61c046d5410785, build: 2026-03-10)
 - Cluster: 3 nodes, s3 crashed while s1/s2 remained healthy

**Additional Context**

- Checked all TDengine GitHub issues — no existing report matches this specific `qWorkerInit` crash pattern
- Verified source code: the bug is present and identical in all versions from 3.4.0.9 through 3.4.1.7 (latest), including `main` branch
- The crash does not appear to be caused by: OOM, disk I/O errors, port conflicts, file permissions, or WAL corruption
- 4 apport core dumps are available if the maintainers need them for further analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: SEGV null pointer dereference in qWorkerInit (qworker.c:1481) during vnode startup on 3.4.0.9 #35280

Bug 1: Missing `terrno` initialization on allocation failure

Bug 2: Use-after-free in `schHash` NULL check path (line 1510-1511)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: SEGV null pointer dereference in qWorkerInit (qworker.c:1481) during vnode startup on 3.4.0.9 #35280

Description

Bug 1: Missing terrno initialization on allocation failure

Bug 2: Use-after-free in schHash NULL check path (line 1510-1511)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug 1: Missing `terrno` initialization on allocation failure

Bug 2: Use-after-free in `schHash` NULL check path (line 1510-1511)