Skip to content

Fix ES delay_on_retry behavior + clarify config example messaging#435

Merged
mattnowzari merged 3 commits into
mainfrom
fix_es_retry_delay_behavior
May 7, 2026
Merged

Fix ES delay_on_retry behavior + clarify config example messaging#435
mattnowzari merged 3 commits into
mainfrom
fix_es_retry_delay_behavior

Conversation

@mattnowzari

@mattnowzari mattnowzari commented May 5, 2026

Copy link
Copy Markdown
Contributor

Closes #380

This PR fixes the delay_on_retry behavior within the Crawler's Elasticsearch client.

Previously, we were calculating

retry_delay ** try_count

where ** in Ruby is an exponent operation, not multiplication. So, a retry delay of 10s goes

10s^1 = 10s
10s^2 = 100s
10x^3 = 1000s

which is too aggressive.

The new calculation correctly doubles the retry backoff time:

retry_delay * (2^(try_count-1))

Thus,

10s * (2^0) = 10s # try_count = 1
10s * (2^1) = 20s # try_count = 2, etc
10s * (2^2) = 40s
10s * (2^3) = 80s

The description of this config setting has been updated in elasticsearch.yaml.example as well.

Checklists

Pre-Review Checklist

  • This PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check crawler.yml.example and elasticsearch.yml.example)
  • This PR has a meaningful title
  • This PR links to all relevant GitHub issues that it fixes or partially addresses
    • If there is no GitHub issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v0.1.0)
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • Ran make notice if any dependencies have been added

Related Pull Requests

Release Note

Fix elasticsearch.delay_on_retry config setting to correctly double the retry backoff with every failure instead of exponentiation of the specified retry backoff time.

@mattnowzari mattnowzari requested a review from a team as a code owner May 5, 2026 15:29
#
## The amount of time to wait between requests after a failure occurs.
## Default: 2000
#elasticsearch.delay_on_retry: 10000

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So default was 10000 seconds?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was meant to be ms, but all other "time" config settings we have are in s, so getting it lined up with how other configs work was worthwhile

Comment thread lib/es/client.rb
try += 1
if try < max_tries
wait_time = @retry_delay**try
# Exponential backoff: the first retry waits @retry_delay seconds, and each

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want an upper limit here. What if max retries is 20? 2^20 is something close to 2 years

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_tries is set by default to 3, and is settable as a config value: elasticsearch.retry_on_failure: 3.

I am good to let this one be because it's adjustable anyways, and the delay on retry behavior is documented as doubling each time

@mattnowzari mattnowzari merged commit 72997d6 into main May 7, 2026
2 checks passed
@mattnowzari mattnowzari deleted the fix_es_retry_delay_behavior branch May 7, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

elasticsearch.delay_on_retry bulk index attempt time unit mismatch (miliseconds vs seconds)

3 participants