MDEV-28730 Remove internal parser usage from InnoDB fulltext #4036

Thirunarayanan · 2025-05-06T09:57:52Z

The Jira issue number for this PR is: MDEV-28730

Description

Introduce a class FTSQueryExecutor to handle the queries for FTS internal tables. Basically it creates a query, prepares the table for read or write process. Build a tuple based on the given table and assign the FTS_CONFIG, FTS_COMMON_TABLES and FTS_AUX_TABLE fields based on the given value. This also handles insert, delete, search, read all entries and delete all entries as of now.

How can this PR be tested?

./mtr --suite=innodb_fts

If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

CLAassistant · 2025-05-06T09:58:00Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

dr-m

Here are some initial comments. I did not review all of this yet.

I welcome the reduction of dependencies on the internal SQL parser, but we must do it carefully and add debug assertions to document the assumptions.

storage/innobase/include/fts0priv.h

storage/innobase/include/fts0types.h

storage/innobase/fts/fts0fts.cc

dr-m · 2025-05-07T12:08:21Z

storage/innobase/fts/fts0fts.cc

+dberr_t FTSQueryExecutor::read_all(dict_table_t *table,
+                                   read_record* func, void *arg)


The C++ way would be to pass a functor object that would also be compatible with a lambda expression. Can you replace `func,arg´ with that?

dr-m · 2025-05-07T12:09:45Z

storage/innobase/fts/fts0opt.cc

+        ib::error() << "(" << err << ") during optimize, while adding a"
+		    << " word to the FTS index.";


Can we please use sql_print_error in new code?

The last operator<< invocation is redundant here; it would concatenating two compile-time constant strings at runtime.

Could we write something like the following here?

sql_print_error("(%s) during optimize for table %.*sQ.%sQ", ut_strerr(err), int(table->name.dblen()), table->name.m_name, table->name.basename());

mysql-test/suite/innodb_fts/r/sync.result

storage/innobase/include/fts0priv.h

storage/innobase/fts/fts0fts.cc

mysql-test/suite/innodb_fts/r/sync.result

dr-m · 2025-05-26T09:16:46Z

storage/innobase/fts/fts0opt.cc

+dberr_t fts_index_fetch_nodes_low(
+  dict_table_t *aux_table, const fts_string_t *word, fts_fetch_t *fetch,
+  FTSQueryRunner *sqlRunner)


Why is this not a member function of FTSQueryRunner?

Why is there no noexcept?

Alternatively, why is sqlRunner the first parameter, so that we can avoid shuffling registers at least for the first statement, which is invoking a member function on sqlRunner?

dr-m · 2025-05-26T09:17:48Z

storage/innobase/fts/fts0opt.cc

+    err= sqlRunner->record_executor(index, READ, match_op, PAGE_CUR_GE,
+                                    fetch->callback, fetch);


Could the callee determine fetch->callback by itself? We don’t need to pass that as a parameter, right?

dr-m · 2025-05-26T09:21:24Z

storage/innobase/fts/fts0opt.cc

+  FTSQueryRunner sqlRunner(trx);
+  fetch->heap= sqlRunner.heap();


We seem to be assigning fetch->heap to something that is allocated from our local stack. Where do we reset fetch->heap before returning from this function? We surely wouldn’t want to leave a dangling pointer to something that should have been destroyed, right?

Do we need to add the data member fts_fetch_t::heap at all? How was the memory managed before these changes?

storage/innobase/fts/fts0opt.cc

- Introduce a class FTSQueryRunner to handle queries for fulltext internal tables. Basically it creates a query, prepares the fulltext internal table for read or write process. Build a tuple based on the given table and assign the FTS_CONFIG, FTS_COMMON_TABLES and FTS_AUX_TABLE fields based on the given value. This class also handles INSERT, DELETE, REPLACE, UPDATE and execute the function for each record (record_executor). FTSQueryRunner::create_query_thread(): Create a query thread to execute the statement on internal FULLTEXT tables FTSQueryRunner::build_tuple(): Build a tuple for the operation FTSQueryRunner::build_clust_ref(): Build a clustered index reference for clustered index lookup for the secondary index record FTSQueryRunner::assign_config_fields(): Assign the tuple for the FTS CONFIG internal table FTSQueryRunner::assign_common_table_fields(): Assign the tuple for FTS_DELETED, FTS_DELETED_CACHE, FTS_BEGIN_DELETED, FTS_BEGIN_DELETED_CACHE common tables FTSQueryRunner::assign_aux_table_fields(): Assign the tuple for FTS_PREFIX_INDEX tables. FTSQueryRunner::handle_error(): Handling error for DB_LOCK_WAIT, retry the operation FTSQueryRunner::open_table(): Open the table based on the fulltext auxiliary table name and FTS common table name FTSQueryRunner::prepare_for_write(): Lock the table for write process by taking Intention Exclusive lock FTSQueryRunner::prepare_for_read(): Lock the table for read process by taking Intention Shared lock FTSQueryRunner::write_record(): Insert the tuple into the given table FTSQueryRunner::lock_or_sees_rec(): Lock the record in case of DELETE, SELECT_UPDATE operation. Fetch the correct version of record in case of READ operation. It also does clustered index lookup in case of search is on secondary index fts_cmp_rec_dtuple_prefix(): Compare the record with given tuple field for tuple field length FTSQueryRunner::record_executor(): Read the record of the given index and do call the callback function for each record FTSQueryRunner::build_update_config(): Build the update vector for FULLTEXT CONFIG table FTSQueryRunner::update_record(): Update the record with update vector exist in FTSQueryRunner Removed the fts_parse_sql(), fts_eval_sql(), fts_get_select_columns_str() and fts_get_docs_clear(). Moved fts_get_table_id() & fts_get_table_name() from fts0sql.cc to fts0fts.cc and deleted the file fts0sql.cc Removed ins_graph, sel_graph from fts_index_cache_t Changed the callback function default read function parameter for each clustered index record to bool fts_sql_callback(dict_index_t*, const rec_t *, const rec_offs*, void *); Following parameters are changed to default read function parameter: fts_read_stopword() fts_fetch_store_doc_id() fts_query_expansion_fetch_doc() fts_read_count() fts_get_rows_count() fts_init_doc_id() fts_init_recover_doc() read_fts_config() fts_optimize_read_node() fts_optimize_index_fetch_node() fts_index_fetch_nodes() fts_fetch_index_words() fts_index_fetch_words() fts_fetch_doc_ids() fts_table_fetch_doc_ids() fts_read_ulint() fts_copy_doc_ids() fts_optimize_create_deleted_doc_id_snapshot() fts_query_index_fetch_nodes() fts_query_fetch_document() fts_query_index_fetch_nodes() row_upd_clust_rec_low(): Function does updates a clustered index record of a row when ordering fields don't change. Function doesn't have dependency on row_prebuilt_t. This can be used by fulltext internal table update operation Row_sel_get_clust_rec_for_mysql::operator(): Removed the parameter row_prebuilt_t and caller does pass the prebuilt related variables Removed the parser usage and execute the query directly on fulltext internal tables in the following function: fts_read_stopword() fts_fetch_store_doc_id() fts_query_expansion_fetch_doc() fts_read_count() fts_get_rows_count() fts_init_doc_id() fts_init_recover_doc() read_fts_config() fts_optimize_read_node() fts_optimize_index_fetch_node() fts_index_fetch_nodes() fts_fetch_index_words() fts_index_fetch_words() fts_fetch_doc_ids() fts_table_fetch_doc_ids() fts_read_ulint() fts_copy_doc_ids() fts_optimize_create_deleted_doc_id_snapshot() fts_query_index_fetch_nodes() fts_query_fetch_document() fts_query_index_fetch_nodes() i_s_fts_deleted_generic_fill() i_s_fts_index_table_fill_selected()

dr-m · 2025-05-27T11:54:27Z

storage/innobase/fts/fts0fts.cc

+fts_write_node(dict_table_t *table,
+               fts_string_t *word, fts_node_t* node,
+               FTSQueryRunner *sqlRunner)


Similar to my comment on fts_index_fetch_nodes_low() yesterday: Why is this not a noexcept member function of FTSQueryRunner? That would allow us to avoid some register shuffling around the first call.

While we need to declare all member functions in one place, we can define them in different compilation units. We already do this for member functions of mtr_t, log_t or buf_pool_t. This practice could be followed also for FTSQueryRunner.

dr-m · 2025-05-27T11:57:41Z

storage/innobase/fts/fts0opt.cc

-
+        std::map<ulint, dict_table_t*> aux_tables;
+        trx_t *trx = optim->trx;
+        dict_table_t *aux_table= nullptr;
+        FTSQueryRunner sqlRunner(trx);
 	/* Setup the callback to use for fetching the word ilist etc. */
 	fetch.read_arg = optim->words;
-	fetch.read_record = fts_optimize_index_fetch_node;
+        fetch.callback = &fts_optimize_index_fetch_node;
+        fetch.heap= sqlRunner.heap();


The added code needs to be indented with TAB, just like the existing code. It could also make sense to reformat this entire function to use indentation by two spaces instead of TAB.

The formatting around = is inconsisetnt here. In the old InnoDB formatting style, there would always be a space before and after =.

dr-m · 2025-05-27T12:02:21Z

storage/innobase/fts/fts0fts.cc

+      ib::error() << "(" << err << ") while updating last doc id for table"
+                  << table->name;


Can we please avoid ib::logger in new code?

sql_print_error("(%s) while updating last doc id for table %.*sQ.%sQ", ut_strerr(err), int(table->name.dblen()), table->name.m_name, table->name.basename());

dr-m · 2025-05-27T12:03:59Z

storage/innobase/fts/fts0fts.cc

-		trx->free();
-	}
+func_exit:
+  if (local_trx)


When does !local_trx hold? In that case, which error would we report and where, in case err != DB_SUCCESS?

dr-m · 2025-05-27T12:05:56Z

storage/innobase/fts/fts0opt.cc

+        ib::error() << "(" << err << ") during optimize, while adding a"
+		    << " word to the FTS index.";


Could we write something like the following here?

sql_print_error("(%s) during optimize for table %.*sQ.%sQ", ut_strerr(err), int(table->name.dblen()), table->name.m_name, table->name.basename());

dr-m · 2025-05-27T12:13:37Z

storage/innobase/fts/fts0opt.cc

+static
+dberr_t fts_copy_doc_ids(dict_table_t *from, dict_table_t *to,
+                         FTSQueryRunner *sqlRunner)
+{
+  std::vector<doc_id_t> doc_ids;
+  dberr_t error= sqlRunner->prepare_for_read(from);
+  if (error == DB_SUCCESS)


Could this be a member function of FTSQueryRunner, or could that be the first parameter of the function? I’m not sure if we’d want ´noexcepthere because thestd::vector` without a custom allocator could throw an exception. In any case, I would rewrite this a bit to simplify the code that is generated for the first call:

static dberr_t fts_copy_doc_ids(FTSQueryRunner *sqlRunner, dict_table_t *from, dict_table_t *to) { if (dberr_t error= sqlRunner->prepare_for_read(from)) return error; std::vector<doc_id_t> doc_ids;

Do we actually need to buffer everything at once, or could we handle each doc_id one by one? Could we use multiple batches over a fixed-size stack-allocated array here to avoid the heap memory allocation? For example, see how btr_search_build_page_hash_index() can process multiple batches of fr[] that are allocated from the stack.

dr-m · 2025-05-27T12:14:18Z

storage/innobase/fts/fts0opt.cc

+  dberr_t error= sqlRunner->prepare_for_read(from);
+  if (error == DB_SUCCESS)
+  {
+    ut_ad(UT_LIST_GET_LEN(from->indexes) == 1);


Could this assertion be at the start of this function? Can we assert the same about to->indexes? Can we assert anything about the table names?

dr-m

It would be good to refactor the fts_sql_callback interface to use a function object.

storage/innobase/fts/fts0config.cc

dr-m · 2025-05-27T12:23:47Z

storage/innobase/fts/fts0config.cc

+  dict_table_t *table= sqlRunner.open_table(fts_table, &err);
+  if (table)
+  {
+    ut_ad(UT_LIST_GET_LEN(table->indexes) == 1);


Could FTSQueryRunner::open_table() include this assertion, as well as assert something on the table name?

dr-m · 2025-05-27T12:25:54Z

storage/innobase/fts/fts0config.cc

+      /* In case of empty table, error can be DB_END_OF_INDEX
+      and key doesn't exist, error can be DB_RECORD_NOT_FOUND */
+      if (err == DB_END_OF_INDEX || err == DB_RECORD_NOT_FOUND)
+        err= DB_SUCCESS;
+    }
+    table->release();


The err adjustment could be done after table->release(). That could lead to simpler code, depending on how smart the optimizer in the compiler is.

dr-m · 2025-05-27T12:30:43Z

storage/innobase/fts/fts0config.cc

+      *value->f_str = '\0';
+      ut_a(value->f_len > 0);


Can the assertion be moved to the start of the function, like they were before this refactoring? Here it is inside a conditional execution path.

Some comment about value->f_len would be useful here. Apparently, it initially specifies the allocated size of the buffer. In read_fts_config() it would be transformed to the actual length. Shouldn’t we also set value->f_len=0 in case no record was fetched (err is one of the two values that we map to DB_SUCCESS)?

dr-m · 2025-05-27T12:34:36Z

storage/innobase/fts/fts0config.cc

+      err= sqlRunner.record_executor(clust_index, READ, MATCH_UNIQUE,
+                                     PAGE_CUR_GE, &read_fts_config, value);


Can we pass a functor object that would in this case wrap both read_fts_config and value? Currently we are using a type-unsafe method of passing void* to read_fts_config().

storage/innobase/fts/fts0fts.cc

dr-m · 2025-05-27T12:39:21Z

storage/innobase/fts/fts0fts.cc

+  FTSQueryRunner sqlRunner(trx);
+  dict_table_t *fts_config= sqlRunner.open_table(&fts_table, &err);
+  if (fts_config)
+  {


Can this whole section of code be moved to a new function in fts0config.cc? In that way, read_fts_config could be defined as static in that file.

Thirunarayanan requested a review from dr-m May 6, 2025 09:57

svoj added the MariaDB Corporation label May 6, 2025

Thirunarayanan force-pushed the 11.8-MDEV-28730 branch from 7e56f12 to aadf724 Compare May 7, 2025 07:06

dr-m reviewed May 7, 2025

View reviewed changes

Thirunarayanan force-pushed the 11.8-MDEV-28730 branch 8 times, most recently from 3c73bf2 to 247b016 Compare May 22, 2025 11:18

dr-m reviewed May 22, 2025

View reviewed changes

Thirunarayanan force-pushed the 11.8-MDEV-28730 branch from 247b016 to dc23f97 Compare May 22, 2025 16:48

Thirunarayanan marked this pull request as ready for review May 22, 2025 16:50

Thirunarayanan force-pushed the 11.8-MDEV-28730 branch from dc23f97 to a9c6fc6 Compare May 26, 2025 07:43

dr-m reviewed May 26, 2025

View reviewed changes

Thirunarayanan force-pushed the 11.8-MDEV-28730 branch from a9c6fc6 to dfa7fe3 Compare May 26, 2025 11:06

dr-m reviewed May 27, 2025

View reviewed changes

		dberr_t FTSQueryExecutor::read_all(dict_table_t *table,
		read_record* func, void *arg)

		ib::error() << "(" << err << ") during optimize, while adding a"
		<< " word to the FTS index.";

		err= sqlRunner->record_executor(index, READ, match_op, PAGE_CUR_GE,
		fetch->callback, fetch);

		FTSQueryRunner sqlRunner(trx);
		fetch->heap= sqlRunner.heap();

		ib::error() << "(" << err << ") while updating last doc id for table"
		<< table->name;

		err= sqlRunner.record_executor(clust_index, READ, MATCH_UNIQUE,
		PAGE_CUR_GE, &read_fts_config, value);

Uh oh!

MDEV-28730 Remove internal parser usage from InnoDB fulltext #4036

Are you sure you want to change the base?

MDEV-28730 Remove internal parser usage from InnoDB fulltext #4036

Uh oh!

Conversation

Thirunarayanan commented May 6, 2025

Description

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

Uh oh!

CLAassistant commented May 6, 2025

Uh oh!

dr-m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dr-m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!