dmchen/try-copilot-self-audit by cadem · Pull Request #35244 · taosdata/TDengine

cadem · 2026-04-28T02:39:18Z

Description

Issue(s)

Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

Is the user manual updated?
Are the test cases passed and automated?
Is there no significant decrease in test coverage?

gemini-code-assist

Code Review

This pull request introduces the auditLocalWrite configuration option, allowing audit logs to be written directly to the local cluster. The changes include adding the global configuration variable, updating the SAudit structure to manage local connections, and implementing background pre-connection logic during initialization. Additionally, a new shell script for stopping dnodes and a Python system test for the local write feature were added. Review feedback suggests improving error handling in auditInit by nullifying pointers after destruction and ensuring mutexes are initialized before use in auditCleanup. There is also a recommendation to use SIGTERM instead of SIGKILL in the new shell script to allow for graceful process shutdowns.

gemini-code-assist · 2026-04-28T02:40:43Z

+  if (taosThreadMutexInit(&tsAudit.connLock, NULL) != 0) {
+    (void)taosThreadMutexDestroy(&tsAudit.recordLock);
+    (void)taosThreadRwlockDestroy(&tsAudit.infoLock);
+    taosArrayDestroyP(tsAudit.records, (FDelete)auditDeleteRecord);
+    return -1;


In the error path of auditInit, after destroying the records array, it is important to set the pointer to NULL. This prevents potential double-free or use-after-free issues if auditCleanup is called later (which is common during dnode shutdown even if initialization partially failed).

if (taosThreadMutexInit(&tsAudit.connLock, NULL) != 0) { (void)taosThreadMutexDestroy(&tsAudit.recordLock); (void)taosThreadRwlockDestroy(&tsAudit.infoLock); taosArrayDestroyP(tsAudit.records, (FDelete)auditDeleteRecord); tsAudit.records = NULL; return -1;

gemini-code-assist · 2026-04-28T02:40:43Z

+  auditStopPreconnectLocal();
+
+  // Close local connection
+  (void)taosThreadMutexLock(&tsAudit.connLock);


Potential undefined behavior: if auditInit fails before tsAudit.connLock is initialized (e.g., if taosArrayInit or taosThreadRwlockInit fails), tsAudit.connLock remains zero-initialized. Calling taosThreadMutexLock on an uninitialized mutex is unsafe. Consider adding a check to verify that the audit system was successfully initialized before attempting to lock and destroy its mutexes.

gemini-code-assist · 2026-04-28T02:40:43Z

+while [ -n "$PID" ]; do
+  echo kill -9 $PID
+  #pkill -9 taosd
+  kill -9 $PID


Using kill -9 (SIGKILL) as the primary method to stop processes is generally discouraged as it prevents graceful shutdown (e.g., flushing WAL, closing files). It is better to send a SIGTERM first, wait for a few seconds, and then use SIGKILL only if the process is still alive. This applies to lines 28, 42, and 56.

Copilot

Pull request overview

This PR introduces an auditLocalWrite configuration knob intended to support writing audit events directly to the local cluster, adds initial local-connection lifecycle hooks in the audit library, and adds supporting test/scripts.

Changes:

Add tsAuditLocalWrite global config + server config wiring (auditLocalWrite).
Extend SAudit with local-connection and preconnect thread state, and add init/cleanup hooks in auditMain.c.
Add a new system test (audit_local_write.py) and a new helper script (stop_dnodes.sh).

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/system-test/0-others/audit_local_write.py	New system test intended to validate audit local-write behavior.
tests/script/sh/stop_dnodes.sh	New process cleanup script for taosd/taos/tmq_sim.
source/libs/audit/src/auditMain.c	Adds connLock init/cleanup and calls into preconnect/stop hooks.
source/libs/audit/inc/auditInt.h	Extends `SAudit` with local connection + preconnect thread fields.
source/common/src/tglobal.c	Adds `tsAuditLocalWrite` defaults + config registration + dynamic option mapping.
include/common/tglobal.h	Declares `tsAuditLocalWrite`.
TD-xxx-try-copilot-self-audit	Empty marker file added at repo root.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T02:46:51Z

+  // Close local connection
+  (void)taosThreadMutexLock(&tsAudit.connLock);
+  if (tsAudit.pLocalConn != NULL) {
+    taos_close(tsAudit.pLocalConn);
+    tsAudit.pLocalConn = NULL;
+  }
+  (void)taosThreadMutexUnlock(&tsAudit.connLock);
+  (void)taosThreadMutexDestroy(&tsAudit.connLock);


auditCleanup() unconditionally locks/destroys tsAudit.connLock. If auditInit() returns early before connLock is initialized (e.g., records/locks init failure), cleanup can still be invoked during env teardown and this becomes undefined behavior. Track whether connLock was successfully initialized (and whether pLocalConn was ever created) and only lock/destroy/close when initialized; alternatively initialize connLock in a way that makes cleanup always safe.

Copilot · 2026-04-28T02:46:52Z

+  // Start non-blocking preconnect in background so startup and RPC threads never wait on taos_connect.
+  if (auditPreconnectLocal() != 0) {


The new auditLocalWrite config is added, but auditInit() always starts the local preconnect logic regardless of whether local write is enabled. This can start an unnecessary background thread/work on deployments that have auditLocalWrite=0. Gate the auditPreconnectLocal() call behind the runtime config (e.g., tsAuditLocalWrite) so the feature is truly opt-in at runtime.

Suggested change

// Start non-blocking preconnect in background so startup and RPC threads never wait on taos_connect.

if (auditPreconnectLocal() != 0) {

// Start non-blocking preconnect in background only when local audit write is enabled,

// so startup and RPC threads never wait on taos_connect for an opt-in feature.

if (tsAuditLocalWrite && auditPreconnectLocal() != 0) {

Copilot · 2026-04-28T02:46:52Z

+
+import time
+
+from ..common.basic import BasicFun
+
+
+class TDTestCase:
+    def init(self, conn, logSql, replicaVar=1):
+        tdLog.debug(f"start to execute {__file__}")
+        self.replicaVar = int(replicaVar)
+        self.Fun = BasicFun()


This test uses a relative import (from ..common.basic import BasicFun), but tests/system-test is not a Python package (no __init__.py files) and there is no tests/system-test/common/ module in the repo, so the test will fail to import/run. Use the existing system-test utilities via absolute imports (e.g., util.dnodes, tdSql.init(conn.cursor(), ...)) or add the missing package/modules + __init__.py files if relative imports are required.

Suggested change

import time

from ..common.basic import BasicFun

class TDTestCase:

def init(self, conn, logSql, replicaVar=1):

tdLog.debug(f"start to execute {__file__}")

self.replicaVar = int(replicaVar)

self.Fun = BasicFun()

from util.dnodes import *

import time

class BasicFunCompat:

def __init__(self, conn, logSql):

self.conn = conn

self.logSql = logSql

self.TDDnodes = tdDnodes

self.dnodes_count = 1

def config_cluster(self, dnodes_count):

self.dnodes_count = int(dnodes_count)

if hasattr(self.TDDnodes, "init"):

self.TDDnodes.init(self.dnodes_count)

def deploy_start_cluster(self):

if hasattr(self.TDDnodes, "deploy"):

self.TDDnodes.deploy(self.dnodes_count)

if hasattr(self.TDDnodes, "start"):

self.TDDnodes.start(self.dnodes_count)

def connect(self):

tdSql.init(self.conn.cursor(), self.logSql)

class TDTestCase:

def init(self, conn, logSql, replicaVar=1):

tdLog.debug(f"start to execute {__file__}")

self.replicaVar = int(replicaVar)

self.conn = conn

self.logSql = logSql

tdSql.init(conn.cursor(), logSql)

self.Fun = BasicFunCompat(conn, logSql)

Copilot · 2026-04-28T02:46:53Z

+PID=`ps -efww | grep /usr/bin/taosd | grep -v grep | awk '{print $2}'`
+if [ -n "$PID" ]; then
+  echo systemctl stop taosd
+  systemctl stop taosd
+fi
+
+PID=`psby taosd | grep -w "[t]aosd" | awk '{print $1}' | head -n 1`
+while [ -n "$PID" ]; do
+  echo kill -9 $PID
+  #pkill -9 taosd
+  kill -9 $PID
+  echo "Killing taosd processes"
+  if [ "$OS_TYPE" != "Darwin" ]; then
+    fuser -k -n tcp 6030
+  else
+    lsof -nti:6030 | xargs kill -9
+  fi
+  PID=`psby taosd | grep -w "[t]aosd" | awk '{print $1}' | head -n 1`
+done
+
+PID=`psby taos | grep -w "[t]aos" | awk '{print $1}' | head -n 1`
+while [ -n "$PID" ]; do
+  echo kill -9 $PID
+  #pkill -9 taos
+  kill -9 $PID
+  echo "Killing taos processes"
+  if [ "$OS_TYPE" != "Darwin" ]; then
+    fuser -k -n tcp 6030
+  else
+    lsof -nti:6030 | xargs kill -9
+  fi
+  PID=`psby taos | grep -w "[t]aos" | awk '{print $1}' | head -n 1`
+done
+
+PID=`psby tmq_sim | grep -w "[t]mq_sim" | awk '{print $1}' | head -n 1`
+while [ -n "$PID" ]; do
+  echo kill -9 $PID
+  #pkill -9 tmq_sim
+  kill -9 $PID
+  echo "Killing tmq_sim processes"
+  if [ "$OS_TYPE" != "Darwin" ]; then
+    fuser -k -n tcp 6030
+  else
+    lsof -nti:6030 | xargs kill -9
+  fi
+  PID=`psby tmq_sim | grep -w "[t]mq_sim" | awk '{print $1}' | head -n 1`
+done


These while [ -n "$PID" ] loops can spin forever if kill -9 $PID fails (e.g., insufficient permissions or PID reused between iterations). Add a bounded retry counter / timeout, and consider using pkill with process name plus a final check, so the script exits deterministically even when processes can’t be killed.

Suggested change

PID=`ps -efww | grep /usr/bin/taosd | grep -v grep | awk '{print $2}'`

if [ -n "$PID" ]; then

echo systemctl stop taosd

systemctl stop taosd

fi

PID=`psby taosd | grep -w "[t]aosd" | awk '{print $1}' | head -n 1`

while [ -n "$PID" ]; do

echo kill -9 $PID

#pkill -9 taosd

kill -9 $PID

echo "Killing taosd processes"

if [ "$OS_TYPE" != "Darwin" ]; then

fuser -k -n tcp 6030

else

lsof -nti:6030 | xargs kill -9

fi

PID=`psby taosd | grep -w "[t]aosd" | awk '{print $1}' | head -n 1`

done

PID=`psby taos | grep -w "[t]aos" | awk '{print $1}' | head -n 1`

while [ -n "$PID" ]; do

echo kill -9 $PID

#pkill -9 taos

kill -9 $PID

echo "Killing taos processes"

if [ "$OS_TYPE" != "Darwin" ]; then

fuser -k -n tcp 6030

else

lsof -nti:6030 | xargs kill -9

fi

PID=`psby taos | grep -w "[t]aos" | awk '{print $1}' | head -n 1`

done

PID=`psby tmq_sim | grep -w "[t]mq_sim" | awk '{print $1}' | head -n 1`

while [ -n "$PID" ]; do

echo kill -9 $PID

#pkill -9 tmq_sim

kill -9 $PID

echo "Killing tmq_sim processes"

if [ "$OS_TYPE" != "Darwin" ]; then

fuser -k -n tcp 6030

else

lsof -nti:6030 | xargs kill -9

fi

PID=`psby tmq_sim | grep -w "[t]mq_sim" | awk '{print $1}' | head -n 1`

done

kill_processes_with_retry() {

PROC_NAME="$1"

GREP_PATTERN="$2"

MAX_RETRIES=10

RETRY_COUNT=0

PID=`psby "$PROC_NAME" | grep -w "$GREP_PATTERN" | awk '{print $1}' | head -n 1`

while [ -n "$PID" ] && [ "$RETRY_COUNT" -lt "$MAX_RETRIES" ]; do

echo kill -9 "$PID"

kill -9 "$PID"

if command -v pkill >/dev/null 2>&1; then

pkill -9 "$PROC_NAME"

fi

echo "Killing $PROC_NAME processes"

if [ "$OS_TYPE" != "Darwin" ]; then

fuser -k -n tcp 6030

else

lsof -nti:6030 | xargs kill -9

fi

RETRY_COUNT=`expr "$RETRY_COUNT" + 1`

sleep 1

PID=`psby "$PROC_NAME" | grep -w "$GREP_PATTERN" | awk '{print $1}' | head -n 1`

done

if [ -n "$PID" ]; then

echo "Failed to kill $PROC_NAME processes after $MAX_RETRIES attempts; last PID: $PID"

fi

}

PID=`ps -efww | grep /usr/bin/taosd | grep -v grep | awk '{print $2}'`

if [ -n "$PID" ]; then

echo systemctl stop taosd

systemctl stop taosd

fi

kill_processes_with_retry taosd "[t]aosd"

kill_processes_with_retry taos "[t]aos"

kill_processes_with_retry tmq_sim "[t]mq_sim"

Copilot · 2026-04-28T02:46:53Z

+    fuser -k -n tcp 6030
+  else
+    lsof -nti:6030 | xargs kill -9
+  fi


On macOS, lsof -nti:6030 | xargs kill -9 may invoke kill with no arguments when nothing is listening on the port, producing errors/noise. Guard the pipeline (e.g., only run kill when PIDs are non-empty) to keep the stop script quiet and predictable.

dmchen/try-copilot-self-audit

2ad47a5

Copilot AI review requested due to automatic review settings April 28, 2026 02:39

cadem requested review from a team, dapan1121, guanshengliang and zitsen as code owners April 28, 2026 02:39

taosdata-bot Bot added the team engine label Apr 28, 2026

Copilot started reviewing on behalf of cadem April 28, 2026 02:39 View session

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dmchen/try-copilot-self-audit#35244

dmchen/try-copilot-self-audit#35244
cadem wants to merge 1 commit into3.0from
dmchen/try-copilot-self-audit

cadem commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Start non-blocking preconnect in background so startup and RPC threads never wait on taos_connect.
		if (auditPreconnectLocal() != 0) {

-  // Start non-blocking preconnect in background so startup and RPC threads never wait on taos_connect.
-  if (auditPreconnectLocal() != 0) {
+  // Start non-blocking preconnect in background only when local audit write is enabled,
+  // so startup and RPC threads never wait on taos_connect for an opt-in feature.
+  if (tsAuditLocalWrite && auditPreconnectLocal() != 0) {

-import time
-from ..common.basic import BasicFun
-class TDTestCase:
-    def init(self, conn, logSql, replicaVar=1):
-        tdLog.debug(f"start to execute {__file__}")
-        self.replicaVar = int(replicaVar)
-        self.Fun = BasicFun()
+from util.dnodes import *
+import time
+class BasicFunCompat:
+    def __init__(self, conn, logSql):
+        self.conn = conn
+        self.logSql = logSql
+        self.TDDnodes = tdDnodes
+        self.dnodes_count = 1
+    def config_cluster(self, dnodes_count):
+        self.dnodes_count = int(dnodes_count)
+        if hasattr(self.TDDnodes, "init"):
+            self.TDDnodes.init(self.dnodes_count)
+    def deploy_start_cluster(self):
+        if hasattr(self.TDDnodes, "deploy"):
+            self.TDDnodes.deploy(self.dnodes_count)
+        if hasattr(self.TDDnodes, "start"):
+            self.TDDnodes.start(self.dnodes_count)
+    def connect(self):
+        tdSql.init(self.conn.cursor(), self.logSql)
+class TDTestCase:
+    def init(self, conn, logSql, replicaVar=1):
+        tdLog.debug(f"start to execute {__file__}")
+        self.replicaVar = int(replicaVar)
+        self.conn = conn
+        self.logSql = logSql
+        tdSql.init(conn.cursor(), logSql)
+        self.Fun = BasicFunCompat(conn, logSql)

Conversation

cadem commented Apr 28, 2026

Description

Issue(s)

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants