You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes, fortunately only rarely with the LLM experiment, we get the error below. We need to debug it to plan what to do. One possibility is simply to retry the connection and the failed request until it makes it. Today, if this error happens, we are likely losing data.
[flowcept][ERROR][frontier06306.frontier.olcf.ornl.gov][pid=61095][thread=140733193385728][function=_start][Connection closed by server.]
Traceback (most recent call last):
File "/lustre/orion/stf219/scratch/souzar/flowcept/flowcept/flowceptor/consumers/document_inserter.py", line 199, in _start
for message in pubsub.listen():
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1653, in listen
response = self.handle_message(self.parse_response(block=True))
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1531, in parse_response
response = self._execute(conn, try_read)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1507, in _execute
return conn.retry.call_with_retry(
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/retry.py", line 49, in call_with_retry
fail(error)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1509, in
lambda error: self._disconnect_raise_connect(conn, error),
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1496, in _disconnect_raise_connect
raise error
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/retry.py", line 46, in call_with_retry
return do()
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1508, in
lambda: command(*args, **kwargs),
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/client.py", line 1529, in try_read
return conn.read_response()
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 848, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 335, in read_response
result = self._read_response(disable_decoding=disable_decoding)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 383, in _read_response
response = [
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 384, in
self._read_response(disable_decoding=disable_decoding)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 377, in _read_response
response = self._buffer.read(length)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 230, in read
self._read_from_socket(length - self.length)
File "/lustre/orion/stf219/scratch/souzar/miniconda/envs/llm3/lib/python3.8/site-packages/redis/connection.py", line 195, in _read_from_socket
raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
The text was updated successfully, but these errors were encountered:
I found that it is an intermittent error that happens on Frontier, likely due to network issues. Anyhow, we might need to consider handling this failure better than just missing the data.
Sometimes, fortunately only rarely with the LLM experiment, we get the error below. We need to debug it to plan what to do. One possibility is simply to retry the connection and the failed request until it makes it. Today, if this error happens, we are likely losing data.
The text was updated successfully, but these errors were encountered: