docs: add mount-point ordering rules, FAQ entries, and repair tool#35281
docs: add mount-point ordering rules, FAQ entries, and repair tool#35281guanshengliang merged 4 commits into3.3.6from
Conversation
- docs/*/08-operation/12-multi.md: add Note 3 explaining that new dataDir entries must be appended at the end of their tier's list; inserting in the middle shifts did_id assignments and causes startup failures. Includes correct/incorrect config examples. - docs/*/27-train-faq: add Q39 (how to correctly add a new mount point) and Q40 (how to recover when a mount point was added out of order using fix-tdengine-disks.sh). Both CN and EN updated. - tools/fix-disk-mounts/fix-tdengine-disks.sh: pure-bash tool (no Python) that repairs current.json (did.id) and vnodes.json (diskPrimary) by scanning real file locations across all configured disks. Supports --apply, --vnode, and --cfg flags; backs up every file before writing. - tools/fix-disk-mounts/README.md: usage guide covering when to use the tool, the underlying did_id mechanism, step-by-step recovery workflow, and relevant TDengine source file references. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Code Review
This pull request introduces documentation and a recovery tool for TDengine multi-tier storage. It updates English and Chinese documentation to warn against inserting dataDir entries in the middle of a list, which causes disk ID shifts and potential data loss. Additionally, it provides a new Bash utility, fix-tdengine-disks.sh, to repair metadata files when such configuration errors occur. Review feedback focused on improving the Bash script's robustness and performance, specifically regarding handling paths with spaces, using defined variables for consistency, and optimizing JSON processing to avoid redundant operations and excessive process spawning.
| while IFS= read -r raw; do | ||
| raw="${raw%%#*}" # strip inline comments | ||
| local p lv pr | ||
| read -r _ p lv pr _ <<< "$raw" || true |
There was a problem hiding this comment.
The current read logic splits the dataDir line by whitespace, which will fail if any path contains spaces (e.g., dataDir "/mnt/space path" 0 0). Using a regular expression to capture the path (including optional quotes) is more robust.
| read -r _ p lv pr _ <<< "$raw" || true | |
| if [[ $raw =~ ^[[:space:]]*dataDir[[:space:]]+("[^"]+"|[^[:space:]]+)([[:space:]]+([0-9]+))?([[:space:]]+([0-9]+))? ]]; then | |
| local p="${BASH_REMATCH[1]//\"/}" | |
| local lv="${BASH_REMATCH[3]:-0}" | |
| local pr="${BASH_REMATCH[5]:-0}" |
| # ── Extract vgId from a current.json path ───────────────────────────────────── | ||
| # e.g. /data01/vnode/vnode3/tsdb/current.json → 3 | ||
| _vg_from_path() { | ||
| sed -n 's|.*/vnode\([0-9][0-9]*\)/tsdb/current\.json|\1|p' <<< "$1" |
There was a problem hiding this comment.
The regular expression for extracting the vgId hardcodes the string vnode. It should use the VNODE_DIR variable defined at the top of the script to ensure consistency if the directory name is ever changed.
| sed -n 's|.*/vnode\([0-9][0-9]*\)/tsdb/current\.json|\1|p' <<< "$1" | |
| sed -n "s|.*/${VNODE_DIR}\([0-9][0-9]*\)/tsdb/current\.json|\1|p" <<< "$1" |
| done < <(find "${DISK_PATHS[$i]}/$VNODE_DIR" -maxdepth 3 \ | ||
| -name "current.json" -path "*/tsdb/*" 2>/dev/null || true) |
There was a problem hiding this comment.
This find command is redundant because all current.json paths are already collected into the _ALL_CJ array in the Main section. You can iterate over _ALL_CJ and check if the path starts with the current DISK_PATHS[$i] to avoid repeated disk scanning.
| done < <(find "${DISK_PATHS[$i]}/$VNODE_DIR" -maxdepth 3 \ | |
| -name "current.json" -path "*/tsdb/*" 2>/dev/null || true) | |
| for cj in "${_ALL_CJ[@]}"; do | |
| [[ "$cj" == "${DISK_PATHS[$i]}"* ]] || continue | |
| local vg; vg=$(_vg_from_path "$cj") | |
| [[ -n "$vg" ]] && _v2did["$vg"]="${DISK_IDS[$i]}" || true | |
| done |
| changed_json=$(printf '%s' "$changed_json" | \ | ||
| jq --argjson i "$idx" --argjson dp "$new_dp" \ | ||
| '.vnodes[$i].diskPrimary = $dp') |
There was a problem hiding this comment.
Calling jq inside a loop to update vnodes.json for every vnode is highly inefficient, as it spawns a new process and re-parses the entire JSON string on every iteration. It is much faster to build a single JSON map of all changes and apply them in one jq call.
| changed_json=$(printf '%s' "$changed_json" | \ | |
| jq --argjson i "$idx" --argjson dp "$new_dp" \ | |
| '.vnodes[$i].diskPrimary = $dp') | |
| local vmap_json | |
| vmap_json=$(jq -n '$ARGS.positional | reduce .[] as $item ({}; .[$item[0]] = ($item[1]|tonumber))' --args "${vmap_args[@]}") | |
| changed_json=$(printf '%s' "$changed_json" | jq --argjson map "$vmap_json" '.vnodes |= map(if .vgId | tostring | . as $id | $map | has($id) then .diskPrimary = $map[$id] else . end)') | |
| changed=true |
There was a problem hiding this comment.
Pull request overview
This PR documents a critical operational constraint in TDengine multi-tier storage (mount-point ordering affects did_id persistence) and adds a recovery tool to repair current.json / vnodes.json after mount ordering mistakes.
Changes:
- Documented “append-only” rules for adding
dataDirentries within a tier to avoiddid_idremapping and startup failures. - Added CN/EN FAQ entries describing the correct procedure and a recovery workflow.
- Introduced a Bash +
jqrepair utility plus a README explaining the underlying mechanism and usage.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/fix-disk-mounts/fix-tdengine-disks.sh | New repair script to rebuild disk→ID mapping and patch current.json / vnodes.json. |
| tools/fix-disk-mounts/README.md | Usage + conceptual documentation for the repair tool and the did_id mechanism. |
| docs/zh/08-operation/12-multi.md | Adds a new note explaining why dataDir lines must be appended within a tier. |
| docs/en/08-operation/12-multi.md | English equivalent of the new append-only note and explanation. |
| docs/zh/27-train-faq/01-faq.md | Adds Q39/Q40 covering correct mount addition and recovery steps. |
| docs/en/27-train-faq/index.md | English equivalent Q39/Q40 entries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| read -r _ p lv pr _ <<< "$raw" || true | ||
| [[ -z "$p" ]] && continue | ||
| [[ "$lv" =~ ^[0-9]+$ ]] || lv=0 | ||
| [[ "$pr" =~ ^[0-9]+$ ]] || pr=0 |
| # Fallback: if no explicit primary flag, first disk is primary | ||
| if [[ -z "$PRIMARY_PATH" && ${#DISK_PATHS[@]} -gt 0 ]]; then | ||
| PRIMARY_PATH="${DISK_PATHS[0]}" | ||
| DISK_IS_PRIMARY[0]=1 | ||
| fi |
| --vnode=*) VNODE_FILTER="${1#--vnode=}" ;; | ||
| --cfg) shift; TAOS_CFG="$1" ;; | ||
| --cfg=*) TAOS_CFG="${1#--cfg=}" ;; | ||
| -h|--help) grep '^#' "$0" | head -16 | sed 's/^# \?//'; exit 0 ;; |
|
|
||
| > **Critical warning:** Do **not** start taosd with a mismatched configuration before repairing the metadata. TDengine's startup scan (`tsdbFSDoSanAndFix`) actively **deletes** any on-disk file that is not referenced in `current.json`. Starting taosd in this state may result in **permanent data loss**. | ||
|
|
||
| Use the `fix-tdengine-disks.sh` tool (located in `community/tools/fix-disk-mounts/`) to repair `current.json` and `vnodes.json` without moving any data files. |
|
|
||
| > **重要警告:** 在修复元数据之前,**切勿启动 taosd**。TDengine 的启动扫描(`tsdbFSDoSanAndFix`)会主动**删除**磁盘上所有未被 `current.json` 引用的文件。带错误配置启动 taosd 可能导致**数据永久丢失**。 | ||
|
|
||
| 使用 `fix-tdengine-disks.sh` 工具(位于 `community/tools/fix-disk-mounts/`)可在**不移动任何数据文件**的前提下,修复 `current.json` 和 `vnodes.json` 中的元数据。 |
…phasis - Replace standalone **Note** with **Note:** so MD036's punctuation ignore rule applies (emphasis ending with : is not treated as heading) - Move trailing : and : outside ** markers in all bold labels across 12-multi.md (both EN/ZH) and the two FAQ files to satisfy the space-after-punctuation-in-emphasis lint rule Co-authored-by: Cursor <cursoragent@cursor.com>
…tion errors - Move colon outside bold markers: **Note:** → **Note**: and **注意:** → **注意**: to avoid space-after-punctuation-in-emphasis lint rule - Move colon outside bold in README: **Typical scenarios...:** → **Typical scenarios**: - Add `text` language specifier to bare fenced code blocks in: - tools/fix-disk-mounts/README.md (lines 25, 135, 147) - docs/en/27-train-faq/index.md (line 386) - docs/zh/27-train-faq/01-faq.md (line 399) Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| while IFS=$'\t' read -r fid cid sfx olv oid; do | ||
| [[ -z "$fid" ]] && continue | ||
| local fn="v${vg}f${fid}ver${cid}.${sfx}" | ||
| if _find_file "$vg" "$fn"; then | ||
| if [[ "$_FL" == "$olv" && "$_FI" == "$oid" ]]; then | ||
| printf ' UNCHANGED %s\n' "$fn" |
| while IFS= read -r raw; do | ||
| raw="${raw%%#*}" # strip inline comments | ||
| local p lv pr | ||
| read -r _ p lv pr _ <<< "$raw" || true | ||
| [[ -z "$p" ]] && continue | ||
| [[ "$lv" =~ ^[0-9]+$ ]] || lv=0 | ||
| [[ "$pr" =~ ^[0-9]+$ ]] || pr=0 | ||
|
|
| DISK_PATHS+=("$p") | ||
| DISK_LEVELS+=("$lv") | ||
| DISK_IDS+=("$did") | ||
| DISK_IS_PRIMARY+=("$pr") | ||
| (( pr == 1 )) && PRIMARY_PATH="$p" || true | ||
| done < <(grep -iE '^\s*dataDir\s' "$TAOS_CFG") |
| --vnode=*) VNODE_FILTER="${1#--vnode=}" ;; | ||
| --cfg) shift; TAOS_CFG="$1" ;; | ||
| --cfg=*) TAOS_CFG="${1#--cfg=}" ;; | ||
| -h|--help) grep '^#' "$0" | head -16 | sed 's/^# \?//'; exit 0 ;; |
| --vnode) shift; VNODE_FILTER="$1" ;; | ||
| --vnode=*) VNODE_FILTER="${1#--vnode=}" ;; | ||
| --cfg) shift; TAOS_CFG="$1" ;; |
|
|
||
| > **Critical warning**: Do **not** start taosd with a mismatched configuration before repairing the metadata. TDengine's startup scan (`tsdbFSDoSanAndFix`) actively **deletes** any on-disk file that is not referenced in `current.json`. Starting taosd in this state may result in **permanent data loss**. | ||
|
|
||
| Use the `fix-tdengine-disks.sh` tool (located in `community/tools/fix-disk-mounts/`) to repair `current.json` and `vnodes.json` without moving any data files. |
|
|
||
| > **重要警告**:在修复元数据之前,**切勿启动 taosd**。TDengine 的启动扫描(`tsdbFSDoSanAndFix`)会主动**删除**磁盘上所有未被 `current.json` 引用的文件。带错误配置启动 taosd 可能导致**数据永久丢失**。 | ||
|
|
||
| 使用 `fix-tdengine-disks.sh` 工具(位于 `community/tools/fix-disk-mounts/`)可在**不移动任何数据文件**的前提下,修复 `current.json` 和 `vnodes.json` 中的元数据。 |
| entries), each described by `fid`, `cid`, and file suffix. | ||
| 2. Reconstructs the expected filename (`v{vgId}f{fid}ver{cid}.{suffix}`) | ||
| and searches all configured disks for its physical location. | ||
| 3. Updates `did.level` and `did.id` in the JSON to match the disk where | ||
| the file actually lives. | ||
| 4. Removes entries for files that cannot be found on any configured disk. |
| read -r _ p lv pr _ <<< "$raw" || true | ||
| [[ -z "$p" ]] && continue | ||
| [[ "$lv" =~ ^[0-9]+$ ]] || lv=0 | ||
| [[ "$pr" =~ ^[0-9]+$ ]] || pr=0 |
| # Fallback: if no explicit primary flag, first disk is primary | ||
| if [[ -z "$PRIMARY_PATH" && ${#DISK_PATHS[@]} -gt 0 ]]; then | ||
| PRIMARY_PATH="${DISK_PATHS[0]}" | ||
| DISK_IS_PRIMARY[0]=1 | ||
| fi |
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
||
| 多级存储所涉及的各层存储介质都是本地存储设备。除了本地存储设备之外,TDengine Enterprise 还支持使用对象存储 (S3),将最冷的一批数据保存在最廉价的介质上,以进一步降低存储成本,并在必要时仍然可以进行查询,且数据存储在哪里也对 SQL 透明。支持对象存储在 3.3.0.0 版本中首次发布,建议使用最新版本。 | ||
|
|
||
| ## 多级存储 |
There was a problem hiding this comment.
PR 中增加了 fix-tdengine-disks.sh 这个脚本,但是在 “多级存储” 一节中没有提到这个脚本。另外,是不是在 FAQ 中也增加挂载点变更后的描述
docs/*/08-operation/12-multi.md: add Note 3 explaining that new
dataDir entries must be appended at the end of their tier's list;
inserting in the middle shifts did_id assignments and causes startup
failures. Includes correct/incorrect config examples.
docs/*/27-train-faq: add Q39 (how to correctly add a new mount point)
and Q40 (how to recover when a mount point was added out of order
using fix-tdengine-disks.sh). Both CN and EN updated.
tools/fix-disk-mounts/fix-tdengine-disks.sh: pure-bash tool (no
Python) that repairs current.json (did.id) and vnodes.json
(diskPrimary) by scanning real file locations across all configured
disks. Supports --apply, --vnode, and --cfg flags; backs up every
file before writing.
tools/fix-disk-mounts/README.md: usage guide covering when to use
the tool, the underlying did_id mechanism, step-by-step recovery
workflow, and relevant TDengine source file references.
Co-authored-by: Cursor cursoragent@cursor.com
Description
Issue(s)
Checklist
Please check the items in the checklist if applicable.