Skip to content

MDEV-28730 Remove internal parser usage from InnoDB fulltext #4036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 11.8
Choose a base branch
from

Conversation

Thirunarayanan
Copy link
Member

  • The Jira issue number for this PR is: MDEV-28730

Description

  • Introduce a class FTSQueryExecutor to handle the queries for FTS internal tables. Basically it creates a query, prepares the table for read or write process. Build a tuple based on the given table and assign the FTS_CONFIG, FTS_COMMON_TABLES and FTS_AUX_TABLE fields based on the given value. This also handles insert, delete, search, read all entries and delete all entries as of now.

How can this PR be tested?

./mtr --suite=innodb_fts

If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@Thirunarayanan Thirunarayanan requested a review from dr-m May 6, 2025 09:57
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some initial comments. I did not review all of this yet.

I welcome the reduction of dependencies on the internal SQL parser, but we must do it carefully and add debug assertions to document the assumptions.

Comment on lines 6389 to 6390
dberr_t FTSQueryExecutor::read_all(dict_table_t *table,
read_record* func, void *arg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C++ way would be to pass a functor object that would also be compatible with a lambda expression. Can you replace `func,arg´ with that?

Comment on lines +1452 to +1305
ib::error() << "(" << err << ") during optimize, while adding a"
<< " word to the FTS index.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please use sql_print_error in new code?

The last operator<< invocation is redundant here; it would concatenating two compile-time constant strings at runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we write something like the following here?

      sql_print_error("(%s) during optimize for table %.*sQ.%sQ",
                      ut_strerr(err),
                      int(table->name.dblen()), table->name.m_name,
                      table->name.basename());

@Thirunarayanan Thirunarayanan force-pushed the 11.8-MDEV-28730 branch 8 times, most recently from 3c73bf2 to 247b016 Compare May 22, 2025 11:18
@Thirunarayanan Thirunarayanan marked this pull request as ready for review May 22, 2025 16:50
Comment on lines +434 to +436
dberr_t fts_index_fetch_nodes_low(
dict_table_t *aux_table, const fts_string_t *word, fts_fetch_t *fetch,
FTSQueryRunner *sqlRunner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a member function of FTSQueryRunner?

Why is there no noexcept?

Alternatively, why is sqlRunner the first parameter, so that we can avoid shuffling registers at least for the first statement, which is invoking a member function on sqlRunner?

Comment on lines +456 to +457
err= sqlRunner->record_executor(index, READ, match_op, PAGE_CUR_GE,
fetch->callback, fetch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the callee determine fetch->callback by itself? We don’t need to pass that as a parameter, right?

Comment on lines +472 to +473
FTSQueryRunner sqlRunner(trx);
fetch->heap= sqlRunner.heap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be assigning fetch->heap to something that is allocated from our local stack. Where do we reset fetch->heap before returning from this function? We surely wouldn’t want to leave a dangling pointer to something that should have been destroyed, right?

Do we need to add the data member fts_fetch_t::heap at all? How was the memory managed before these changes?

- Introduce a class FTSQueryRunner to handle queries for fulltext
internal tables. Basically it creates a query, prepares the
fulltext internal table for read or write process. Build a tuple
based on the given table and assign the FTS_CONFIG, FTS_COMMON_TABLES
and FTS_AUX_TABLE fields based on the given value. This class
also handles INSERT, DELETE, REPLACE, UPDATE and
execute the function for each record (record_executor).

FTSQueryRunner::create_query_thread(): Create a query thread to execute
the statement on internal FULLTEXT tables

FTSQueryRunner::build_tuple(): Build a tuple for the operation

FTSQueryRunner::build_clust_ref(): Build a clustered index reference
for clustered index lookup for the secondary index record

FTSQueryRunner::assign_config_fields(): Assign the tuple for the
FTS CONFIG internal table

FTSQueryRunner::assign_common_table_fields(): Assign the tuple for
FTS_DELETED, FTS_DELETED_CACHE, FTS_BEGIN_DELETED,
FTS_BEGIN_DELETED_CACHE common tables

FTSQueryRunner::assign_aux_table_fields(): Assign the tuple for
FTS_PREFIX_INDEX tables.

FTSQueryRunner::handle_error(): Handling error for DB_LOCK_WAIT,
retry the operation

FTSQueryRunner::open_table(): Open the table based on the fulltext
auxiliary table name and FTS common table name

FTSQueryRunner::prepare_for_write(): Lock the table for write
process by taking Intention Exclusive lock

FTSQueryRunner::prepare_for_read(): Lock the table for read
process by taking Intention Shared lock

FTSQueryRunner::write_record(): Insert the tuple into the given table

FTSQueryRunner::lock_or_sees_rec(): Lock the record in case of
DELETE, SELECT_UPDATE operation. Fetch the correct version of record in
case of READ operation. It also does clustered index lookup in case
of search is on secondary index

fts_cmp_rec_dtuple_prefix(): Compare the record with given tuple field
for tuple field length

FTSQueryRunner::record_executor(): Read the record of the given index and
do call the callback function for each record

FTSQueryRunner::build_update_config(): Build the update vector for
FULLTEXT CONFIG table

FTSQueryRunner::update_record(): Update the record with update vector
exist in FTSQueryRunner

Removed the fts_parse_sql(), fts_eval_sql(), fts_get_select_columns_str()
and fts_get_docs_clear().

Moved fts_get_table_id() & fts_get_table_name() from fts0sql.cc to
fts0fts.cc and deleted the file fts0sql.cc

Removed ins_graph, sel_graph from fts_index_cache_t

Changed the callback function default read function parameter for
each clustered index record to

bool fts_sql_callback(dict_index_t*, const rec_t *, const rec_offs*,
                      void *);

Following parameters are changed to default read function parameter:

fts_read_stopword()
fts_fetch_store_doc_id()
fts_query_expansion_fetch_doc()
fts_read_count()
fts_get_rows_count()
fts_init_doc_id()
fts_init_recover_doc()
read_fts_config()
fts_optimize_read_node()
fts_optimize_index_fetch_node()
fts_index_fetch_nodes()
fts_fetch_index_words()
fts_index_fetch_words()
fts_fetch_doc_ids()
fts_table_fetch_doc_ids()
fts_read_ulint()
fts_copy_doc_ids()
fts_optimize_create_deleted_doc_id_snapshot()
fts_query_index_fetch_nodes()
fts_query_fetch_document()
fts_query_index_fetch_nodes()

row_upd_clust_rec_low(): Function does updates a clustered
index record of a row when ordering fields don't change.
Function doesn't have dependency on row_prebuilt_t. This can be
used by fulltext internal table update operation

Row_sel_get_clust_rec_for_mysql::operator(): Removed the
parameter row_prebuilt_t and caller does pass the prebuilt
related variables

Removed the parser usage and execute the query directly on
fulltext internal tables in the following function:

fts_read_stopword()
fts_fetch_store_doc_id()
fts_query_expansion_fetch_doc()
fts_read_count()
fts_get_rows_count()
fts_init_doc_id()
fts_init_recover_doc()
read_fts_config()
fts_optimize_read_node()
fts_optimize_index_fetch_node()
fts_index_fetch_nodes()
fts_fetch_index_words()
fts_index_fetch_words()
fts_fetch_doc_ids()
fts_table_fetch_doc_ids()
fts_read_ulint()
fts_copy_doc_ids()
fts_optimize_create_deleted_doc_id_snapshot()
fts_query_index_fetch_nodes()
fts_query_fetch_document()
fts_query_index_fetch_nodes()
i_s_fts_deleted_generic_fill()
i_s_fts_index_table_fill_selected()
Comment on lines +3725 to +3727
fts_write_node(dict_table_t *table,
fts_string_t *word, fts_node_t* node,
FTSQueryRunner *sqlRunner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment on fts_index_fetch_nodes_low() yesterday: Why is this not a noexcept member function of FTSQueryRunner? That would allow us to avoid some register shuffling around the first call.

While we need to declare all member functions in one place, we can define them in different compilation units. We already do this for member functions of mtr_t, log_t or buf_pool_t. This practice could be followed also for FTSQueryRunner.

Comment on lines -1794 to +1606

std::map<ulint, dict_table_t*> aux_tables;
trx_t *trx = optim->trx;
dict_table_t *aux_table= nullptr;
FTSQueryRunner sqlRunner(trx);
/* Setup the callback to use for fetching the word ilist etc. */
fetch.read_arg = optim->words;
fetch.read_record = fts_optimize_index_fetch_node;
fetch.callback = &fts_optimize_index_fetch_node;
fetch.heap= sqlRunner.heap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added code needs to be indented with TAB, just like the existing code. It could also make sense to reformat this entire function to use indentation by two spaces instead of TAB.

The formatting around = is inconsisetnt here. In the old InnoDB formatting style, there would always be a space before and after =.

Comment on lines +2763 to +2764
ib::error() << "(" << err << ") while updating last doc id for table"
<< table->name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please avoid ib::logger in new code?

      sql_print_error("(%s) while updating last doc id for table %.*sQ.%sQ",
                      ut_strerr(err),
                      int(table->name.dblen()), table->name.m_name,
                      table->name.basename());

trx->free();
}
func_exit:
if (local_trx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does !local_trx hold? In that case, which error would we report and where, in case err != DB_SUCCESS?

Comment on lines +1452 to +1305
ib::error() << "(" << err << ") during optimize, while adding a"
<< " word to the FTS index.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we write something like the following here?

      sql_print_error("(%s) during optimize for table %.*sQ.%sQ",
                      ut_strerr(err),
                      int(table->name.dblen()), table->name.m_name,
                      table->name.basename());

Comment on lines +1995 to +2001
static
dberr_t fts_copy_doc_ids(dict_table_t *from, dict_table_t *to,
FTSQueryRunner *sqlRunner)
{
std::vector<doc_id_t> doc_ids;
dberr_t error= sqlRunner->prepare_for_read(from);
if (error == DB_SUCCESS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a member function of FTSQueryRunner, or could that be the first parameter of the function? I’m not sure if we’d want ´noexcepthere because thestd::vector` without a custom allocator could throw an exception. In any case, I would rewrite this a bit to simplify the code that is generated for the first call:

static
dberr_t fts_copy_doc_ids(FTSQueryRunner *sqlRunner,
                         dict_table_t *from, dict_table_t *to)
{
  if (dberr_t error= sqlRunner->prepare_for_read(from))
    return error;
  std::vector<doc_id_t> doc_ids;

Do we actually need to buffer everything at once, or could we handle each doc_id one by one? Could we use multiple batches over a fixed-size stack-allocated array here to avoid the heap memory allocation? For example, see how btr_search_build_page_hash_index() can process multiple batches of fr[] that are allocated from the stack.

dberr_t error= sqlRunner->prepare_for_read(from);
if (error == DB_SUCCESS)
{
ut_ad(UT_LIST_GET_LEN(from->indexes) == 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this assertion be at the start of this function? Can we assert the same about to->indexes? Can we assert anything about the table names?

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to refactor the fts_sql_callback interface to use a function object.

dict_table_t *table= sqlRunner.open_table(fts_table, &err);
if (table)
{
ut_ad(UT_LIST_GET_LEN(table->indexes) == 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could FTSQueryRunner::open_table() include this assertion, as well as assert something on the table name?

Comment on lines +64 to +69
/* In case of empty table, error can be DB_END_OF_INDEX
and key doesn't exist, error can be DB_RECORD_NOT_FOUND */
if (err == DB_END_OF_INDEX || err == DB_RECORD_NOT_FOUND)
err= DB_SUCCESS;
}
table->release();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err adjustment could be done after table->release(). That could lead to simpler code, depending on how smart the optimizer in the compiler is.

Comment on lines +56 to +57
*value->f_str = '\0';
ut_a(value->f_len > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the assertion be moved to the start of the function, like they were before this refactoring? Here it is inside a conditional execution path.

Some comment about value->f_len would be useful here. Apparently, it initially specifies the allocated size of the buffer. In read_fts_config() it would be transformed to the actual length. Shouldn’t we also set value->f_len=0 in case no record was fetched (err is one of the two values that we map to DB_SUCCESS)?

Comment on lines +62 to +63
err= sqlRunner.record_executor(clust_index, READ, MATCH_UNIQUE,
PAGE_CUR_GE, &read_fts_config, value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass a functor object that would in this case wrap both read_fts_config and value? Currently we are using a type-unsafe method of passing void* to read_fts_config().

Comment on lines +2699 to +2702
FTSQueryRunner sqlRunner(trx);
dict_table_t *fts_config= sqlRunner.open_table(&fts_table, &err);
if (fts_config)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this whole section of code be moved to a new function in fts0config.cc? In that way, read_fts_config could be defined as static in that file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants