Skip to content

docs: add mount-point ordering rules, FAQ entries, and repair tool#35281

Merged
guanshengliang merged 4 commits into3.3.6from
feat/336/add-disk-fix-tools
May 7, 2026
Merged

docs: add mount-point ordering rules, FAQ entries, and repair tool#35281
guanshengliang merged 4 commits into3.3.6from
feat/336/add-disk-fix-tools

Conversation

@xiao-77
Copy link
Copy Markdown
Contributor

@xiao-77 xiao-77 commented May 6, 2026

  • docs/*/08-operation/12-multi.md: add Note 3 explaining that new
    dataDir entries must be appended at the end of their tier's list;
    inserting in the middle shifts did_id assignments and causes startup
    failures. Includes correct/incorrect config examples.

  • docs/*/27-train-faq: add Q39 (how to correctly add a new mount point)
    and Q40 (how to recover when a mount point was added out of order
    using fix-tdengine-disks.sh). Both CN and EN updated.

  • tools/fix-disk-mounts/fix-tdengine-disks.sh: pure-bash tool (no
    Python) that repairs current.json (did.id) and vnodes.json
    (diskPrimary) by scanning real file locations across all configured
    disks. Supports --apply, --vnode, and --cfg flags; backs up every
    file before writing.

  • tools/fix-disk-mounts/README.md: usage guide covering when to use
    the tool, the underlying did_id mechanism, step-by-step recovery
    workflow, and relevant TDengine source file references.

Co-authored-by: Cursor cursoragent@cursor.com

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

- docs/*/08-operation/12-multi.md: add Note 3 explaining that new
  dataDir entries must be appended at the end of their tier's list;
  inserting in the middle shifts did_id assignments and causes startup
  failures. Includes correct/incorrect config examples.

- docs/*/27-train-faq: add Q39 (how to correctly add a new mount point)
  and Q40 (how to recover when a mount point was added out of order
  using fix-tdengine-disks.sh). Both CN and EN updated.

- tools/fix-disk-mounts/fix-tdengine-disks.sh: pure-bash tool (no
  Python) that repairs current.json (did.id) and vnodes.json
  (diskPrimary) by scanning real file locations across all configured
  disks. Supports --apply, --vnode, and --cfg flags; backs up every
  file before writing.

- tools/fix-disk-mounts/README.md: usage guide covering when to use
  the tool, the underlying did_id mechanism, step-by-step recovery
  workflow, and relevant TDengine source file references.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings May 6, 2026 02:16
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation and a recovery tool for TDengine multi-tier storage. It updates English and Chinese documentation to warn against inserting dataDir entries in the middle of a list, which causes disk ID shifts and potential data loss. Additionally, it provides a new Bash utility, fix-tdengine-disks.sh, to repair metadata files when such configuration errors occur. Review feedback focused on improving the Bash script's robustness and performance, specifically regarding handling paths with spaces, using defined variables for consistency, and optimizing JSON processing to avoid redundant operations and excessive process spawning.

while IFS= read -r raw; do
raw="${raw%%#*}" # strip inline comments
local p lv pr
read -r _ p lv pr _ <<< "$raw" || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current read logic splits the dataDir line by whitespace, which will fail if any path contains spaces (e.g., dataDir "/mnt/space path" 0 0). Using a regular expression to capture the path (including optional quotes) is more robust.

Suggested change
read -r _ p lv pr _ <<< "$raw" || true
if [[ $raw =~ ^[[:space:]]*dataDir[[:space:]]+("[^"]+"|[^[:space:]]+)([[:space:]]+([0-9]+))?([[:space:]]+([0-9]+))? ]]; then
local p="${BASH_REMATCH[1]//\"/}"
local lv="${BASH_REMATCH[3]:-0}"
local pr="${BASH_REMATCH[5]:-0}"

# ── Extract vgId from a current.json path ─────────────────────────────────────
# e.g. /data01/vnode/vnode3/tsdb/current.json → 3
_vg_from_path() {
sed -n 's|.*/vnode\([0-9][0-9]*\)/tsdb/current\.json|\1|p' <<< "$1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regular expression for extracting the vgId hardcodes the string vnode. It should use the VNODE_DIR variable defined at the top of the script to ensure consistency if the directory name is ever changed.

Suggested change
sed -n 's|.*/vnode\([0-9][0-9]*\)/tsdb/current\.json|\1|p' <<< "$1"
sed -n "s|.*/${VNODE_DIR}\([0-9][0-9]*\)/tsdb/current\.json|\1|p" <<< "$1"

Comment on lines +309 to +310
done < <(find "${DISK_PATHS[$i]}/$VNODE_DIR" -maxdepth 3 \
-name "current.json" -path "*/tsdb/*" 2>/dev/null || true)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This find command is redundant because all current.json paths are already collected into the _ALL_CJ array in the Main section. You can iterate over _ALL_CJ and check if the path starts with the current DISK_PATHS[$i] to avoid repeated disk scanning.

Suggested change
done < <(find "${DISK_PATHS[$i]}/$VNODE_DIR" -maxdepth 3 \
-name "current.json" -path "*/tsdb/*" 2>/dev/null || true)
for cj in "${_ALL_CJ[@]}"; do
[[ "$cj" == "${DISK_PATHS[$i]}"* ]] || continue
local vg; vg=$(_vg_from_path "$cj")
[[ -n "$vg" ]] && _v2did["$vg"]="${DISK_IDS[$i]}" || true
done

Comment on lines +336 to +338
changed_json=$(printf '%s' "$changed_json" | \
jq --argjson i "$idx" --argjson dp "$new_dp" \
'.vnodes[$i].diskPrimary = $dp')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling jq inside a loop to update vnodes.json for every vnode is highly inefficient, as it spawns a new process and re-parses the entire JSON string on every iteration. It is much faster to build a single JSON map of all changes and apply them in one jq call.

Suggested change
changed_json=$(printf '%s' "$changed_json" | \
jq --argjson i "$idx" --argjson dp "$new_dp" \
'.vnodes[$i].diskPrimary = $dp')
local vmap_json
vmap_json=$(jq -n '$ARGS.positional | reduce .[] as $item ({}; .[$item[0]] = ($item[1]|tonumber))' --args "${vmap_args[@]}")
changed_json=$(printf '%s' "$changed_json" | jq --argjson map "$vmap_json" '.vnodes |= map(if .vgId | tostring | . as $id | $map | has($id) then .diskPrimary = $map[$id] else . end)')
changed=true

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR documents a critical operational constraint in TDengine multi-tier storage (mount-point ordering affects did_id persistence) and adds a recovery tool to repair current.json / vnodes.json after mount ordering mistakes.

Changes:

  • Documented “append-only” rules for adding dataDir entries within a tier to avoid did_id remapping and startup failures.
  • Added CN/EN FAQ entries describing the correct procedure and a recovery workflow.
  • Introduced a Bash + jq repair utility plus a README explaining the underlying mechanism and usage.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tools/fix-disk-mounts/fix-tdengine-disks.sh New repair script to rebuild disk→ID mapping and patch current.json / vnodes.json.
tools/fix-disk-mounts/README.md Usage + conceptual documentation for the repair tool and the did_id mechanism.
docs/zh/08-operation/12-multi.md Adds a new note explaining why dataDir lines must be appended within a tier.
docs/en/08-operation/12-multi.md English equivalent of the new append-only note and explanation.
docs/zh/27-train-faq/01-faq.md Adds Q39/Q40 covering correct mount addition and recovery steps.
docs/en/27-train-faq/index.md English equivalent Q39/Q40 entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

read -r _ p lv pr _ <<< "$raw" || true
[[ -z "$p" ]] && continue
[[ "$lv" =~ ^[0-9]+$ ]] || lv=0
[[ "$pr" =~ ^[0-9]+$ ]] || pr=0
Comment on lines +89 to +93
# Fallback: if no explicit primary flag, first disk is primary
if [[ -z "$PRIMARY_PATH" && ${#DISK_PATHS[@]} -gt 0 ]]; then
PRIMARY_PATH="${DISK_PATHS[0]}"
DISK_IS_PRIMARY[0]=1
fi
--vnode=*) VNODE_FILTER="${1#--vnode=}" ;;
--cfg) shift; TAOS_CFG="$1" ;;
--cfg=*) TAOS_CFG="${1#--cfg=}" ;;
-h|--help) grep '^#' "$0" | head -16 | sed 's/^# \?//'; exit 0 ;;
Comment thread docs/en/27-train-faq/index.md Outdated

> **Critical warning:** Do **not** start taosd with a mismatched configuration before repairing the metadata. TDengine's startup scan (`tsdbFSDoSanAndFix`) actively **deletes** any on-disk file that is not referenced in `current.json`. Starting taosd in this state may result in **permanent data loss**.

Use the `fix-tdengine-disks.sh` tool (located in `community/tools/fix-disk-mounts/`) to repair `current.json` and `vnodes.json` without moving any data files.
Comment thread docs/zh/27-train-faq/01-faq.md Outdated

> **重要警告:** 在修复元数据之前,**切勿启动 taosd**。TDengine 的启动扫描(`tsdbFSDoSanAndFix`)会主动**删除**磁盘上所有未被 `current.json` 引用的文件。带错误配置启动 taosd 可能导致**数据永久丢失**。

使用 `fix-tdengine-disks.sh` 工具(位于 `community/tools/fix-disk-mounts/`)可在**不移动任何数据文件**的前提下,修复 `current.json` 和 `vnodes.json` 中的元数据。
xiao-77 and others added 2 commits May 6, 2026 10:23
…phasis

- Replace standalone **Note** with **Note:** so MD036's punctuation
  ignore rule applies (emphasis ending with : is not treated as heading)
- Move trailing : and : outside ** markers in all bold labels across
  12-multi.md (both EN/ZH) and the two FAQ files to satisfy the
  space-after-punctuation-in-emphasis lint rule

Co-authored-by: Cursor <cursoragent@cursor.com>
…tion errors

- Move colon outside bold markers: **Note:** → **Note**: and **注意:** → **注意**:
  to avoid space-after-punctuation-in-emphasis lint rule
- Move colon outside bold in README: **Typical scenarios...:** → **Typical scenarios**:
- Add `text` language specifier to bare fenced code blocks in:
  - tools/fix-disk-mounts/README.md (lines 25, 135, 147)
  - docs/en/27-train-faq/index.md (line 386)
  - docs/zh/27-train-faq/01-faq.md (line 399)

Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings May 6, 2026 02:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +195 to +200
while IFS=$'\t' read -r fid cid sfx olv oid; do
[[ -z "$fid" ]] && continue
local fn="v${vg}f${fid}ver${cid}.${sfx}"
if _find_file "$vg" "$fn"; then
if [[ "$_FL" == "$olv" && "$_FI" == "$oid" ]]; then
printf ' UNCHANGED %s\n' "$fn"
Comment on lines +56 to +63
while IFS= read -r raw; do
raw="${raw%%#*}" # strip inline comments
local p lv pr
read -r _ p lv pr _ <<< "$raw" || true
[[ -z "$p" ]] && continue
[[ "$lv" =~ ^[0-9]+$ ]] || lv=0
[[ "$pr" =~ ^[0-9]+$ ]] || pr=0

Comment on lines +82 to +87
DISK_PATHS+=("$p")
DISK_LEVELS+=("$lv")
DISK_IDS+=("$did")
DISK_IS_PRIMARY+=("$pr")
(( pr == 1 )) && PRIMARY_PATH="$p" || true
done < <(grep -iE '^\s*dataDir\s' "$TAOS_CFG")
--vnode=*) VNODE_FILTER="${1#--vnode=}" ;;
--cfg) shift; TAOS_CFG="$1" ;;
--cfg=*) TAOS_CFG="${1#--cfg=}" ;;
-h|--help) grep '^#' "$0" | head -16 | sed 's/^# \?//'; exit 0 ;;
Comment on lines +30 to +32
--vnode) shift; VNODE_FILTER="$1" ;;
--vnode=*) VNODE_FILTER="${1#--vnode=}" ;;
--cfg) shift; TAOS_CFG="$1" ;;
Comment thread docs/en/27-train-faq/index.md Outdated

> **Critical warning**: Do **not** start taosd with a mismatched configuration before repairing the metadata. TDengine's startup scan (`tsdbFSDoSanAndFix`) actively **deletes** any on-disk file that is not referenced in `current.json`. Starting taosd in this state may result in **permanent data loss**.

Use the `fix-tdengine-disks.sh` tool (located in `community/tools/fix-disk-mounts/`) to repair `current.json` and `vnodes.json` without moving any data files.
Comment thread docs/zh/27-train-faq/01-faq.md Outdated

> **重要警告**:在修复元数据之前,**切勿启动 taosd**。TDengine 的启动扫描(`tsdbFSDoSanAndFix`)会主动**删除**磁盘上所有未被 `current.json` 引用的文件。带错误配置启动 taosd 可能导致**数据永久丢失**。

使用 `fix-tdengine-disks.sh` 工具(位于 `community/tools/fix-disk-mounts/`)可在**不移动任何数据文件**的前提下,修复 `current.json` 和 `vnodes.json` 中的元数据。
Comment on lines +66 to +71
entries), each described by `fid`, `cid`, and file suffix.
2. Reconstructs the expected filename (`v{vgId}f{fid}ver{cid}.{suffix}`)
and searches all configured disks for its physical location.
3. Updates `did.level` and `did.id` in the JSON to match the disk where
the file actually lives.
4. Removes entries for files that cannot be found on any configured disk.
read -r _ p lv pr _ <<< "$raw" || true
[[ -z "$p" ]] && continue
[[ "$lv" =~ ^[0-9]+$ ]] || lv=0
[[ "$pr" =~ ^[0-9]+$ ]] || pr=0
Comment on lines +89 to +93
# Fallback: if no explicit primary flag, first disk is primary
if [[ -z "$PRIMARY_PATH" && ${#DISK_PATHS[@]} -gt 0 ]]; then
PRIMARY_PATH="${DISK_PATHS[0]}"
DISK_IS_PRIMARY[0]=1
fi
Co-authored-by: Cursor <cursoragent@cursor.com>

多级存储所涉及的各层存储介质都是本地存储设备。除了本地存储设备之外,TDengine Enterprise 还支持使用对象存储 (S3),将最冷的一批数据保存在最廉价的介质上,以进一步降低存储成本,并在必要时仍然可以进行查询,且数据存储在哪里也对 SQL 透明。支持对象存储在 3.3.0.0 版本中首次发布,建议使用最新版本。

## 多级存储
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 中增加了 fix-tdengine-disks.sh 这个脚本,但是在 “多级存储” 一节中没有提到这个脚本。另外,是不是在 FAQ 中也增加挂载点变更后的描述

@guanshengliang guanshengliang merged commit 8cf0ccb into 3.3.6 May 7, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants