-
Notifications
You must be signed in to change notification settings - Fork 13k
Description
About the Bug...
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of MyBatis.
-
I have confirmed this bug reproduces without 3rd party extensions (e.g. mybatis-plus).
Database Version
SQLITE3
JDBC Driver Version
sqlite-jdbc:3.42.0.0
Issue Description
datasource init
Properties properties = new Properties();
properties.put("driver", JDBC.class.getName());
properties.setProperty("poolMaximumActiveConnections", "1");
DataSourceFactory dataSourceFactory = new PooledDataSourceFactory();
dataSourceFactory.setProperties(properties);
DataSource dataSource = dataSourceFactory.getDataSource();
Issue: SQLite Database Locking and Invalid Connections Despite Connection Pool Limit Set to 1
We are using SQLite as the underlying database, which does not support concurrent writes. In our task execution, numerous read and write threads access the database concurrently. Even though we have set poolMaximumActiveConnections=1
, we are encountering MyBatis errors such as:
connection is invalid
sqlite_error: SQL error or missing database (cannot rollback - no transaction is active)
sqlite_error: database is locked
These symptoms suggest concurrent writes are occurring on SQLite, and the MyBatis connection pool is creating more than one active connection, despite our configuration.
By enabling MyBatis debug logging, we observed that more than one active connection exists even when properties.setProperty("poolMaximumActiveConnections", "1")
is set. After analyzing the MyBatis source code, we identified a race condition that can cause the actual number of active connections to exceed the configured limit.
Root Cause Analysis: Race Condition in PooledDataSource
Under High Contention
Assume three threads: thread1
, thread2
, and thread3
, concurrently requesting database connections.
Step 1.1: Initial Connection Acquisition
thread1
andthread2
simultaneously attempt to acquire a connection.thread1
acquires the lock, callspopConnection()
, gets a connection, adds it tothis.state.activeConnections
, releases the lock, and executes a long-running SQL query (over 20 seconds).thread2
waits for the lock.
Step 1.2: Timeout and Connection Revalidation
- After
thread1
releases the lock,thread2
enterspopConnection()
. - It finds
idleConnections
empty and the active connection count at the maximum (1
). thread2
waits for up topoolTimeToWait
(default 20,000ms).- After ~20 seconds, if
thread1
hasn't returned the connection,thread2
checks the checkout time and finds it exceeds the threshold. thread2
marks thePooledConnection
as invalid, removes it fromactiveConnections
(using therealConnection
'shashCode
), and then wraps the samerealConnection
in a newPooledConnection
object, which is added back toactiveConnections
.
At this point, the underlying
realConnection
remains the same, but it is now associated with a newPooledConnection
instance.
Step 1.3: Connection Return and Incorrect Removal
- When
thread1
finishes its SQL and callspushConnection()
to return the connection:- It calls
this.state.activeConnections.remove(conn)
. - The removal uses the
realConnection
'shashCode
, so it successfully removes the entry — even though it was replaced bythread2
. - As a result, the connection currently used by
thread2
is removed fromactiveConnections
.
- It calls
- Now both
activeConnections
andidleConnections
are empty — butthread2
still holds a validrealConnection
.
Step 1.4: New Connection Created
thread3
callspopConnection()
.- It finds both
activeConnections
andidleConnections
empty. - Since no connections are available, it creates a new physical connection to SQLite.
- Now, both
thread2
andthread3
hold valid connections to the database, with differentrealConnection
hash codes.
This results in two concurrent connections accessing SQLite, violating its single-writer constraint and causing:
database is locked
cannot rollback - no transaction is active
connection is invalid
Once these extra connections are created and later returned, the pool may permanently exceed the configured poolMaximumActiveConnections=1
, especially under sustained load.
Why Are There Long-Running SQL Queries?
Although our queries are paginated (max 5,000 rows per page, up to 200,000 rows, ~200MB database file), we suspect deep pagination or I/O bottlenecks contribute to long execution times. Disk I/O monitoring shows we are near the disk's read/write limit. Additionally, the node shares disk resources with other services, further degrading performance.
We have already implemented application-level serialization in Java to prevent concurrent database access, which mitigates the issue. However, we believe this race condition in MyBatis’s PooledDataSource
should be addressed at the framework level.
Conclusion and Request
While this race condition may be tolerable for databases that support concurrent connections (where briefly exceeding the pool size is harmless), it is catastrophic for SQLite, which cannot handle concurrent writes.
We kindly request the MyBatis team to review this scenario and consider:
- Making
poolMaximumActiveConnections
strictly enforced, even under connection timeout and invalidation. - Potentially adding a "strict single connection" mode for embedded databases like SQLite.
Thank you for your time and consideration.
中文版本:
我们底层使用sqlite数据库,他不支持并发写入,在我们的任务执行中,有许多读写线程并发操作数据库(即使我们设定了poolMaximumActiveConnections为1),会出现mybatis报错
connection is invaild 和 sqlite_error SQL_error or missing database(cannot rollback - no transaction is active) 和 sqlite-error database is locked.
看样子猜测是sqlite出现了并发写入,以及mybatis连接池中出现了多个连接导致该问题的产生。并且我通过打开mybatis的debug日志,发现在properties.setProperty("poolMaximumActiveConnections", "1");的情况下,在并发情况下确实出现了大于一个的活跃连接数。通过分析mybatis的源码发现可能存在这样一种情况会导致poolMaximumActiveConnections不严格小于或等于设置的值。
假定同时有三个线程thread1 to 3,并发要获取sqlite的连接操作数据库。
1.1 刚开始 thread1 与thread2同时来获取连接,thread1抢到了锁,调用popConnection方法获取连接,并放入this.state.activeConnections列表中,释放锁,拿着连接去执行了一个时间大于20s的sql。
1.2 thread2在thread1释放锁之后进来继续popConnection,发现此时idleConnections是empty,且已达到最大活跃连接数,该线程就去等待poolTimeToWait等待释放thread1释放连接,20秒的等待之后,若thread1还未释放连接,那么
thread2会检查该连接的检出时间是否超过阈值(20s),此时发现该连接确实超过了阈值,就会将该连接(PooledConnetion)设置为失效,移出activeConnections队列(这里的remove却使用了底层realConnection的hashCode),并将真实的连接(realConnetion)再次取出给new PooledConnetion对象进行包装,放到activeConnetions队列中去,此时1.1和1.2步骤,在activeConnetions的视角表现就是真实的realConnetion还是同一个,但是换了一个PooledConnetion对象。
1.3 thread1 sql执行完归还连接(已在1.2中被thread2设置PooledDatasource为失效,不会放入idleConnetions中)调用pushConnection方法执行 this.state.activeConnections.remove(conn); 如1.2提到remove其实是按照realConnetion的hashcode进行移出的,所以会导致thread2检出的连接又被移出activeConnections队列,此时activeConnections长度为0,idleConnections长度为0。但是实际上thread2还持有一个真实的连接对象。
1.4 thread3进来popConnetion准备获取连接,由于activeConnections长度为0,idleConnections长度为0,则会创建一个新连接去使用。此时就出现同时有thread2 与 thread3持有数据库的连接进行操作数据库,并且两个连接readConnetion的hashCode也是不一样的,当使用完后,thread2/3 归还连接到idleConnetions中。后续如果有并发SQL操作,就永远会出现活跃连接数超过1的情况。
如上所示,这种情况针对其他支持并发的数据库来说并没有什么大碍,多出的连接会快会被回收掉,即使出现了大于设定值的也不影响其正常使用,但是在sqlite数据库中这种情况就非常致命。
对于为什么会出现大于20s的SQL,我们其实对sqlite的查询是单页最大5000条,数据库中最大条数为200000条,sqlite的db文件为200M 左右,我们怀疑出现了深分页的情况(但是我们用更小的数据量也复现了这种情况),同时观看了磁盘的IO,发现也确实达到了我们磁盘的读写瓶颈,怀疑除了我们自身sql的运行外,该node上和其他服务进程共用的磁盘IO也影响了我们的整体运行情况。这一点我们已经在java代码层面做好了串行化的处理,规避了这个问题,希望官方这边能分析下是否这种情况有优化空间。
感谢你的时间。
About your report...
-
I did not use images 🖼️ for showing text information (code, error, etc.).
-
I checked the Preview and my report looks awesome! 👍