Skip to content

[Important] Fix the body duplication problem when an retry happen after a connection error #1895

@AbanoubGhadban

Description

@AbanoubGhadban

Cause of the error

The error happens because of the following bug:

  • If a streaming page started streaming and sent some html chunks to rails side.
  • In the middle of the streaming process, a connection error happens like ConnectionTimeout or a descriptor closed error.
  • The retries plugin at HTTPx retries to make the request again.
  • When the retry happens, some of the HTML chunks are already processed and sent or are on the way to the client.
  • When the request is made again, the returned chunks are appended to the original HTML chunks returned from the first erroneous request.
  • The sent HTML to the client contains erroneous and non-errornous HTML, which causes DOM errors and hydration errrors.
  • The same problem happens if the retry logic written by us at the request.rb file is executed instead of the HTTPx retry logic.

The exact scenario happens at the failing test

To check the error that happens and cause HTTPx retry plugin to retry the request, I wrote the following code at the code that initiates the HTTPx object

HTTPX
# For persistent connections we want retries,
# so the requests don't just fail if the other side closes the connection
# https://honeyryderchuck.gitlab.io/httpx/wiki/Persistent
.plugin(
  :retries,
  max_retries: 1,
  retry_change_requests: true,
  retry_after: ->(req, res) do
    Rails.logger.error("An error occured and retry is going to be made")
    Rails.logger.error("Error: #{res.error}")
    Rails.logger.error("Request Body: #{req.body&.to_s.first(1000)}")
    nil
  end,
)

It logs the error that happens and causes the retry to happen. And by checking the testing log file here. I found that the error happens is

Error: descriptor closed

Which happens for no specific reason, I already increased all timeout values at HTTPx side by setting the following configs

timeout: {
  connect_timeout: 100,
  read_timeout: 100,
  write_timeout: 100,
  request_timeout: 100,
  operation_timeout: 100,
  keep_alive_timeout: 100,
}

And at Fastify side, I added the following:

const app = fastify({
  http2: useHttp2 as true,
  bodyLimit: 104857600, // 100 MB
  logger:
    logHttpLevel !== 'silent' ? { name: 'RORP HTTP', level: logHttpLevel, ...sharedLoggerOptions } : false,
  ...fastifyServerOptions,
  pluginTimeout: 1_000_1000,
  requestTimeout: 1_000_1000,
  keepAliveTimeout: 1_000_1000,
  connectionTimeout: 1_000_1000,
  http2SessionTimeout: 1_000_1000,
});

And the same error still happens. It's planned to reproduce the error outside React on Rails and check if it's a problem at HTTPx or at Fastify and report the issue then. However, when the HTTPx retries plugin is removed, the error disappears, which makes us suspect that the problem is at HTTPx.

Reproduce the problem locally

To reproduce the problem locally, try opening a streaming page that takes too long to render and causes a Timeout error. You can try to open http://localhost:3000/rsc_posts_page_over_redis?artificial_delay=3000 page, and you can see that the page content is repeated twice before redirecting you to the 500 error page. At the CI test, the page is not redirected to the 500 error page, because at the second trial to SSR the page, no Timeout error happens.
You can try to reproduce it locally by adding a code like this to the RSCPageComponent

const newProps = { ...props };
if (fs.existsSync('./skip_js_delay')) {
  newProps.artificialDelay = 0;
  fs.rmSync('./skip_js_delay');
} else {
  fs.writeFileSync('./skip_js_delay', '');
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions