mysql-connect_retries_on_failure not effective to failures like connection refused #4597

xykhappy · 2024-07-31T10:45:16Z

For the hostgroup with a single backend, if the backend server goes down, the ProxySQL will return the error "Max connect timeout reached while reaching hostgroup" after the "mysql-connect_timeout_server_max" duration elapses if the backend is marked as "SHUNNED". In contrast, if establishing a direct connection without ProxySQL, the client will get a "connection refused" error or error code 111 immediately, which makes more sense.

After debugging the latest binary, it appears that ProxySQL keeps attempting to get a good connection at intervals controlled by "mysql-connect_retries_delay" until "mysql-connect_timeout_server_max" is reached according to the method MySQL_Session::handler_again___status_CONNECTING_SERVER.

proxysql/lib/MySQL_Session.cpp

Line 3177 in 27e71d2

bool MySQL_Session::handler_again___status_CONNECTING_SERVER(int *_rc) {

And the retry is not even controlled by "mysql-connect_retries_on_failure" (as it should be) because the process always falls into this condition in the scenario I mentioned above.

proxysql/lib/MySQL_Session.cpp

Line 3244 in 27e71d2

if (mybe->server_myds->myconn==NULL) {

If debug further, we will find the method MyHGC::get_random_MySrvC returns NULL all the time during the retry for this single "SHUNNED" backend.

proxysql/lib/MyHGC.cpp

Line 55 in 27e71d2

    
           MySrvC *MyHGC::get_random_MySrvC(char * gtid_uuid, uint64_t gtid_trxid, int max_lag_ms, MySQL_Session *sess) {

According to this method, the ProxySQL should attempt to bring the "SHUNNED" backend online but it fails all the time because the "mysrvc->time_last_detected_error" is always in the future related to "mysql-monitor_ping_interval".

					// if Monitor is enabled and mysql-monitor_ping_interval is
					// set too high, ProxySQL will unshun hosts that are not
					// available. For this reason time_last_detected_error will
					// be tuned in the future
					if (mysql_thread___monitor_enabled) {
						int a = mysql_thread___shun_recovery_time_sec;
						int b = mysql_thread___monitor_ping_interval;
						b = b/1000;
						if (b > a) {
							t = t + (b - a);
						}
					}
					mysrvc->time_last_detected_error = t;

I'm not sure whether the logic is intentionally designed this way. However, the retry time should be controlled by 'mysql-connect_retries_on_failure' rather than its current implementation. Additionally, I believe it would be more effective to return the exact failure error to the client.

The text was updated successfully, but these errors were encountered:

renecannao · 2024-07-31T11:53:24Z

Hi @xykhappy .

A premise of this issue:
if you get error 111 when trying to connect, that means that there is no process listening on that port.
111 is OS error code 111: Connection refused , that means that the 3 way TCP handshake couldn't happen .

The issue you are facing is actually a feature.
In fact, one of the main feature of ProxySQL is to handle backend failures and failover in a way that is completely transparent to the clients.

if the backend server goes down, the ProxySQL will return the error "Max connect timeout reached while reaching hostgroup" [...] In contrast, if establishing a direct connection without ProxySQL, the client will get a "connection refused" error or error code 111 immediately, which makes more sense

It doesn't really make sense once you understand the difference between the two setups.

If you are attempting directly to a backend, you get error 111 because there is no service.
That means that the connection was not established.

If the client connects to ProxySQL , the connection to the client's backend (ProxySQL) was successful.
Then, when the client tries to execute a query and there are no backend avaiable, ProxySQL is not able to execute the query.
Should you expect the client to see its connection (that was successfully created) being dropped?
For sure it can't get an error 111 , because 111 means Connection refused , and it is not possible to return to that stage.
Maybe (I would argue no) the client could receive a Lost connection .

If a client connects to ProxySQL , it runs a query successfully, and the backend is restarted, would you expect ProxySQL to drop the client's connection?
The way ProxySQL operates would allow you to restart a backend without any disruption to the client connections.

In a nutshell, you are trying to compare a direct connection to a backend to running a query through proxysql .
They are two completely different entities (connection vs query) in two completely different setups.

Furthermore, please remember that ProxySQL is a reverse proxy and not a forward proxy.
The client sends the query to the proxy , and the proxy will returns a result to the client: the client is completely unaware of the backends' involvement and status.

xykhappy · 2024-07-31T13:17:01Z

Hi @renecannao,

Thanks for the quick response!
I completely agree with you regarding the handling of backend failures. And I have updated the issue title to prevent any confusion.
To make my point more clear, I think the ProxySQL should offer users the same level of control to the retry logic of the "connection refused" error as it does to other failure types. In general, the ProxySQL should honor not only the "mysql-connect_retries_delay" but also "mysql-connect_retries_on_failure" when processing such error. In current implementation, it keeps retrying until "mysql-connect_timeout_server_max" duration elapses, which causes a slow failure.
In our scenario, during an Aurora failover, we expect the process to fail rapidly to prevent connection creation requests from accumulating on the client side and subsequently overwhelming the backend Aurora instance. Conversely, we require a relatively large "mysql-connect_timeout_server_max" and "mysql-connect_timeout_server" (around 5 seconds) to ensure the client can endure a certain level of performance degradation in Aurora. This presents a contradiction since both situations depend on the "mysql-connect_timeout_server_max" setting.

Hope it makes more sense this time.

xykhappy · 2024-09-23T03:42:05Z

Created the pull request for this.

xykhappy changed the title ~~Get "Max connect timeout reached" rather than "connection refused" for single broken backend in a hostgroup~~ mysql-connect_retries_on_failure not effective to failures like connection refused Jul 31, 2024

xykhappy mentioned this issue Sep 23, 2024

Use connect_retries_on_failure to control the number of retries when no backend server is available #4665

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mysql-connect_retries_on_failure not effective to failures like connection refused #4597

mysql-connect_retries_on_failure not effective to failures like connection refused #4597

xykhappy commented Jul 31, 2024

renecannao commented Jul 31, 2024

xykhappy commented Jul 31, 2024 •

edited

Loading

xykhappy commented Sep 23, 2024

mysql-connect_retries_on_failure not effective to failures like connection refused #4597

mysql-connect_retries_on_failure not effective to failures like connection refused #4597

Comments

xykhappy commented Jul 31, 2024

renecannao commented Jul 31, 2024

xykhappy commented Jul 31, 2024 • edited Loading

xykhappy commented Sep 23, 2024

xykhappy commented Jul 31, 2024 •

edited

Loading