diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 80c76a6efa20e..80fd339a7d923 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -30,7 +30,7 @@ The `IMPORT INTO` statement lets you import data to TiDB via the [Physical Impor - For TiDB Self-Managed, the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630) is expected to have at least 90 GiB of available space. It is recommended to allocate storage space that is equal to or greater than the volume of data to be imported. - One import job supports importing data into one target table only. - `IMPORT INTO` is not supported during TiDB cluster upgrades. -- Ensure that the data to be imported does not contain any records with primary key or non-null unique index conflicts. Otherwise, the conflicts can result in import task failures. +- When not using [Global Sort](/tidb-global-sort.md) or when using a version earlier than v8.5.5, you must ensure that the data to be imported does not contain any records with primary key or non-null unique index conflicts, because such conflicts will cause the import task to fail. Starting from v8.5.5, when you use [Global Sort](/tidb-global-sort.md), `IMPORT INTO` automatically resolves these conflicts by removing all conflicting rows. - Known issue: the `IMPORT INTO` task might fail if the PD address in the TiDB node configuration file is inconsistent with the current PD topology of the cluster. This inconsistency can arise in situations such as that PD was scaled in previously, but the TiDB configuration file was not updated accordingly or the TiDB node was not restarted after the configuration file update. ### `IMPORT INTO ... FROM FILE` restrictions @@ -227,6 +227,53 @@ SET GLOBAL tidb_server_memory_limit='75%'; > - If the KV range overlap in a source data file is low, enabling Global Sort might decrease import performance. This is because when Global Sort is enabled, TiDB needs to wait for the completion of local sorting in all sub-jobs before proceeding with the Global Sort operations and subsequent import. > - After an import job using Global Sort completes, the files stored in the cloud storage for Global Sort are cleaned up asynchronously in a background thread. +#### Conflict resolution + +Starting from v8.5.5, when you use [Global Sort](/tidb-global-sort.md), `IMPORT INTO` automatically resolves primary key or unique index conflicts by removing all conflicting rows. + +For example, when Global Sort is enabled and you import data into a table created with `CREATE TABLE t(id INT PRIMARY KEY, v INT);` from the following data file `conflicts.csv`: + +```csv +id,v +1,2 +1,3 +2,2 +3,3 +3,3 +4,4 +``` + +```sql +IMPORT INTO t FROM 's3://mybucket/conflicts.csv' WITH THREAD=8, SKIP_ROWS=1; +``` + +After the import, the table `t` contains only the non-conflicting rows. + +```sql +SHOW IMPORT JOBS; +``` + +``` ++--------+-----------------------------+--------------+----------+-------+----------+------------------+---------------+--------------------+----------------------------+----------------------------+----------------------------+------------+ +| Job_ID | Data_Source | Target_Table | Table_ID | Phase | Status | Source_File_Size | Imported_Rows | Result_Message | Create_Time | Start_Time | End_Time | Created_By | ++--------+-----------------------------+--------------+----------+-------+----------+------------------+---------------+--------------------+----------------------------+----------------------------+----------------------------+------------+ +| 30001 | s3://mybucket/conflicts.csv | `test`.`t` | 114 | | finished | 24B | 2 | 4 conflicted rows. | 2025-11-28 17:21:40.591023 | 2025-11-28 17:21:41.109977 | 2025-11-28 17:21:44.112506 | root@% | ++--------+-----------------------------+--------------+----------+-------+----------+------------------+---------------+--------------------+----------------------------+----------------------------+----------------------------+------------+ +``` + +```sql +SELECT * FROM t; ++----+------+ +| id | v | ++----+------+ +| 2 | 2 | +| 4 | 4 | ++----+------+ +2 rows in set (0.01 sec) +``` + +Details of the conflicted rows are stored in the cloud storage URI. For more information, see [Conflicted rows information when using Global Sort](#conflicted-rows-information-when-using-global-sort). + ### Output When `IMPORT INTO ... FROM FILE` completes the import or when the `DETACHED` mode is enabled, TiDB returns the current job information in the output, as shown in the following examples. For the description of each field, see [`SHOW IMPORT JOB(s)`](/sql-statements/sql-statement-show-import-job.md). @@ -253,6 +300,36 @@ IMPORT INTO t FROM '/path/to/small.csv' WITH DETACHED; +--------+--------------------+--------------+----------+-------+---------+------------------+---------------+----------------+----------------------------+------------+----------+------------+ ``` +#### Conflicted rows information when using Global Sort + +When you use [Global Sort](/tidb-global-sort.md) and an import job has conflicted rows, the number of conflicted rows is displayed in the `Result_Message` column of [`SHOW IMPORT`](/sql-statements/sql-statement-show-import-job.md), as shown in the following example: + +```sql +IMPORT INTO t FROM 's3://mybucket/conflicts.csv' WITH THREAD=8, SKIP_ROWS=1; +SHOW IMPORT JOBS; +``` + +``` ++--------+-----------------------------+--------------+----------+-------+----------+------------------+---------------+--------------------+----------------------------+----------------------------+----------------------------+------------+ +| Job_ID | Data_Source | Target_Table | Table_ID | Phase | Status | Source_File_Size | Imported_Rows | Result_Message | Create_Time | Start_Time | End_Time | Created_By | ++--------+-----------------------------+--------------+----------+-------+----------+------------------+---------------+--------------------+----------------------------+----------------------------+----------------------------+------------+ +| 30001 | s3://mybucket/conflicts.csv | `test`.`t` | 114 | | finished | 24B | 2 | 4 conflicted rows. | 2025-11-28 17:21:40.591023 | 2025-11-28 17:21:41.109977 | 2025-11-28 17:21:44.112506 | root@% | ++--------+-----------------------------+--------------+----------+-------+----------+------------------+---------------+--------------------+----------------------------+----------------------------+----------------------------+------------+ +``` + +You can view details of the conflicted rows in the `conflicted-rows/` folder of the cloud storage URI. For example: + +``` +s3://mybucket/sorted-dir/conflicted-rows/1/1-28f0e03a-27c3-4283-a523-418859bb7a2c/data-1.txt +s3://mybucket/sorted-dir/conflicted-rows/1/1-298a387a-065c-4ea9-bc31-949e89ca186f/data-1.txt +s3://mybucket/sorted-dir/conflicted-rows/1/1-2e306ff5-40eb-4045-9992-2ccc04ab7bef/data-1.txt +s3://mybucket/sorted-dir/conflicted-rows/1/1-382d8886-f6e9-4c5e-891c-69d54694a175/data-1.txt +s3://mybucket/sorted-dir/conflicted-rows/1/1-3fd1b7e7-70d4-4734-a12f-ec4f70c3a9af/data-1.txt +s3://mybucket/sorted-dir/conflicted-rows/1/1-46320371-436f-46fa-a8ab-b0f2e2eecf75/data-1.txt +s3://mybucket/sorted-dir/conflicted-rows/1/1-631f1c02-4a25-4c8f-9e8d-6ae61a801f5c/data-1.txt +s3://mybucket/sorted-dir/conflicted-rows/1/1-68f84255-4965-4d9e-9add-934ba07b5ef1/data-1.txt +``` + ### View and manage import jobs For an import job with the `DETACHED` mode enabled, you can use [`SHOW IMPORT`](/sql-statements/sql-statement-show-import-job.md) to view its current job progress.