-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
MDEV-28730 Remove internal parser usage from InnoDB fulltext #4036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11.8
Are you sure you want to change the base?
Conversation
|
7e56f12
to
aadf724
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some initial comments. I did not review all of this yet.
I welcome the reduction of dependencies on the internal SQL parser, but we must do it carefully and add debug assertions to document the assumptions.
storage/innobase/fts/fts0fts.cc
Outdated
dberr_t FTSQueryExecutor::read_all(dict_table_t *table, | ||
read_record* func, void *arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ way would be to pass a functor object that would also be compatible with a lambda expression. Can you replace `func,arg´ with that?
ib::error() << "(" << err << ") during optimize, while adding a" | ||
<< " word to the FTS index."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please use sql_print_error
in new code?
The last operator<<
invocation is redundant here; it would concatenating two compile-time constant strings at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we write something like the following here?
sql_print_error("(%s) during optimize for table %.*sQ.%sQ",
ut_strerr(err),
int(table->name.dblen()), table->name.m_name,
table->name.basename());
3c73bf2
to
247b016
Compare
247b016
to
dc23f97
Compare
dc23f97
to
a9c6fc6
Compare
dberr_t fts_index_fetch_nodes_low( | ||
dict_table_t *aux_table, const fts_string_t *word, fts_fetch_t *fetch, | ||
FTSQueryRunner *sqlRunner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a member function of FTSQueryRunner
?
Why is there no noexcept
?
Alternatively, why is sqlRunner
the first parameter, so that we can avoid shuffling registers at least for the first statement, which is invoking a member function on sqlRunner
?
err= sqlRunner->record_executor(index, READ, match_op, PAGE_CUR_GE, | ||
fetch->callback, fetch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the callee determine fetch->callback
by itself? We don’t need to pass that as a parameter, right?
FTSQueryRunner sqlRunner(trx); | ||
fetch->heap= sqlRunner.heap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be assigning fetch->heap
to something that is allocated from our local stack. Where do we reset fetch->heap
before returning from this function? We surely wouldn’t want to leave a dangling pointer to something that should have been destroyed, right?
Do we need to add the data member fts_fetch_t::heap
at all? How was the memory managed before these changes?
- Introduce a class FTSQueryRunner to handle queries for fulltext internal tables. Basically it creates a query, prepares the fulltext internal table for read or write process. Build a tuple based on the given table and assign the FTS_CONFIG, FTS_COMMON_TABLES and FTS_AUX_TABLE fields based on the given value. This class also handles INSERT, DELETE, REPLACE, UPDATE and execute the function for each record (record_executor). FTSQueryRunner::create_query_thread(): Create a query thread to execute the statement on internal FULLTEXT tables FTSQueryRunner::build_tuple(): Build a tuple for the operation FTSQueryRunner::build_clust_ref(): Build a clustered index reference for clustered index lookup for the secondary index record FTSQueryRunner::assign_config_fields(): Assign the tuple for the FTS CONFIG internal table FTSQueryRunner::assign_common_table_fields(): Assign the tuple for FTS_DELETED, FTS_DELETED_CACHE, FTS_BEGIN_DELETED, FTS_BEGIN_DELETED_CACHE common tables FTSQueryRunner::assign_aux_table_fields(): Assign the tuple for FTS_PREFIX_INDEX tables. FTSQueryRunner::handle_error(): Handling error for DB_LOCK_WAIT, retry the operation FTSQueryRunner::open_table(): Open the table based on the fulltext auxiliary table name and FTS common table name FTSQueryRunner::prepare_for_write(): Lock the table for write process by taking Intention Exclusive lock FTSQueryRunner::prepare_for_read(): Lock the table for read process by taking Intention Shared lock FTSQueryRunner::write_record(): Insert the tuple into the given table FTSQueryRunner::lock_or_sees_rec(): Lock the record in case of DELETE, SELECT_UPDATE operation. Fetch the correct version of record in case of READ operation. It also does clustered index lookup in case of search is on secondary index fts_cmp_rec_dtuple_prefix(): Compare the record with given tuple field for tuple field length FTSQueryRunner::record_executor(): Read the record of the given index and do call the callback function for each record FTSQueryRunner::build_update_config(): Build the update vector for FULLTEXT CONFIG table FTSQueryRunner::update_record(): Update the record with update vector exist in FTSQueryRunner Removed the fts_parse_sql(), fts_eval_sql(), fts_get_select_columns_str() and fts_get_docs_clear(). Moved fts_get_table_id() & fts_get_table_name() from fts0sql.cc to fts0fts.cc and deleted the file fts0sql.cc Removed ins_graph, sel_graph from fts_index_cache_t Changed the callback function default read function parameter for each clustered index record to bool fts_sql_callback(dict_index_t*, const rec_t *, const rec_offs*, void *); Following parameters are changed to default read function parameter: fts_read_stopword() fts_fetch_store_doc_id() fts_query_expansion_fetch_doc() fts_read_count() fts_get_rows_count() fts_init_doc_id() fts_init_recover_doc() read_fts_config() fts_optimize_read_node() fts_optimize_index_fetch_node() fts_index_fetch_nodes() fts_fetch_index_words() fts_index_fetch_words() fts_fetch_doc_ids() fts_table_fetch_doc_ids() fts_read_ulint() fts_copy_doc_ids() fts_optimize_create_deleted_doc_id_snapshot() fts_query_index_fetch_nodes() fts_query_fetch_document() fts_query_index_fetch_nodes() row_upd_clust_rec_low(): Function does updates a clustered index record of a row when ordering fields don't change. Function doesn't have dependency on row_prebuilt_t. This can be used by fulltext internal table update operation Row_sel_get_clust_rec_for_mysql::operator(): Removed the parameter row_prebuilt_t and caller does pass the prebuilt related variables Removed the parser usage and execute the query directly on fulltext internal tables in the following function: fts_read_stopword() fts_fetch_store_doc_id() fts_query_expansion_fetch_doc() fts_read_count() fts_get_rows_count() fts_init_doc_id() fts_init_recover_doc() read_fts_config() fts_optimize_read_node() fts_optimize_index_fetch_node() fts_index_fetch_nodes() fts_fetch_index_words() fts_index_fetch_words() fts_fetch_doc_ids() fts_table_fetch_doc_ids() fts_read_ulint() fts_copy_doc_ids() fts_optimize_create_deleted_doc_id_snapshot() fts_query_index_fetch_nodes() fts_query_fetch_document() fts_query_index_fetch_nodes() i_s_fts_deleted_generic_fill() i_s_fts_index_table_fill_selected()
a9c6fc6
to
dfa7fe3
Compare
fts_write_node(dict_table_t *table, | ||
fts_string_t *word, fts_node_t* node, | ||
FTSQueryRunner *sqlRunner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my comment on fts_index_fetch_nodes_low()
yesterday: Why is this not a noexcept
member function of FTSQueryRunner
? That would allow us to avoid some register shuffling around the first call.
While we need to declare all member functions in one place, we can define them in different compilation units. We already do this for member functions of mtr_t
, log_t
or buf_pool_t
. This practice could be followed also for FTSQueryRunner
.
|
||
std::map<ulint, dict_table_t*> aux_tables; | ||
trx_t *trx = optim->trx; | ||
dict_table_t *aux_table= nullptr; | ||
FTSQueryRunner sqlRunner(trx); | ||
/* Setup the callback to use for fetching the word ilist etc. */ | ||
fetch.read_arg = optim->words; | ||
fetch.read_record = fts_optimize_index_fetch_node; | ||
fetch.callback = &fts_optimize_index_fetch_node; | ||
fetch.heap= sqlRunner.heap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added code needs to be indented with TAB, just like the existing code. It could also make sense to reformat this entire function to use indentation by two spaces instead of TAB.
The formatting around =
is inconsisetnt here. In the old InnoDB formatting style, there would always be a space before and after =
.
ib::error() << "(" << err << ") while updating last doc id for table" | ||
<< table->name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please avoid ib::logger
in new code?
sql_print_error("(%s) while updating last doc id for table %.*sQ.%sQ",
ut_strerr(err),
int(table->name.dblen()), table->name.m_name,
table->name.basename());
trx->free(); | ||
} | ||
func_exit: | ||
if (local_trx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does !local_trx
hold? In that case, which error would we report and where, in case err != DB_SUCCESS
?
ib::error() << "(" << err << ") during optimize, while adding a" | ||
<< " word to the FTS index."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we write something like the following here?
sql_print_error("(%s) during optimize for table %.*sQ.%sQ",
ut_strerr(err),
int(table->name.dblen()), table->name.m_name,
table->name.basename());
static | ||
dberr_t fts_copy_doc_ids(dict_table_t *from, dict_table_t *to, | ||
FTSQueryRunner *sqlRunner) | ||
{ | ||
std::vector<doc_id_t> doc_ids; | ||
dberr_t error= sqlRunner->prepare_for_read(from); | ||
if (error == DB_SUCCESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a member function of FTSQueryRunner
, or could that be the first parameter of the function? I’m not sure if we’d want ´noexcepthere because the
std::vector` without a custom allocator could throw an exception. In any case, I would rewrite this a bit to simplify the code that is generated for the first call:
static
dberr_t fts_copy_doc_ids(FTSQueryRunner *sqlRunner,
dict_table_t *from, dict_table_t *to)
{
if (dberr_t error= sqlRunner->prepare_for_read(from))
return error;
std::vector<doc_id_t> doc_ids;
Do we actually need to buffer everything at once, or could we handle each doc_id one by one? Could we use multiple batches over a fixed-size stack-allocated array here to avoid the heap memory allocation? For example, see how btr_search_build_page_hash_index()
can process multiple batches of fr[]
that are allocated from the stack.
dberr_t error= sqlRunner->prepare_for_read(from); | ||
if (error == DB_SUCCESS) | ||
{ | ||
ut_ad(UT_LIST_GET_LEN(from->indexes) == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this assertion be at the start of this function? Can we assert the same about to->indexes
? Can we assert anything about the table names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to refactor the fts_sql_callback
interface to use a function object.
dict_table_t *table= sqlRunner.open_table(fts_table, &err); | ||
if (table) | ||
{ | ||
ut_ad(UT_LIST_GET_LEN(table->indexes) == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could FTSQueryRunner::open_table()
include this assertion, as well as assert something on the table name?
/* In case of empty table, error can be DB_END_OF_INDEX | ||
and key doesn't exist, error can be DB_RECORD_NOT_FOUND */ | ||
if (err == DB_END_OF_INDEX || err == DB_RECORD_NOT_FOUND) | ||
err= DB_SUCCESS; | ||
} | ||
table->release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The err
adjustment could be done after table->release()
. That could lead to simpler code, depending on how smart the optimizer in the compiler is.
*value->f_str = '\0'; | ||
ut_a(value->f_len > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the assertion be moved to the start of the function, like they were before this refactoring? Here it is inside a conditional execution path.
Some comment about value->f_len
would be useful here. Apparently, it initially specifies the allocated size of the buffer. In read_fts_config()
it would be transformed to the actual length. Shouldn’t we also set value->f_len=0
in case no record was fetched (err
is one of the two values that we map to DB_SUCCESS
)?
err= sqlRunner.record_executor(clust_index, READ, MATCH_UNIQUE, | ||
PAGE_CUR_GE, &read_fts_config, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pass a functor object that would in this case wrap both read_fts_config
and value
? Currently we are using a type-unsafe method of passing void*
to read_fts_config()
.
FTSQueryRunner sqlRunner(trx); | ||
dict_table_t *fts_config= sqlRunner.open_table(&fts_table, &err); | ||
if (fts_config) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this whole section of code be moved to a new function in fts0config.cc
? In that way, read_fts_config
could be defined as static
in that file.
Description
How can this PR be tested?
./mtr --suite=innodb_fts
If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.
Basing the PR against the correct MariaDB version
main
branch.PR quality check