Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe add retry logic for failed remote data access #1219

Open
jpivarski opened this issue May 14, 2024 · 0 comments
Open

Maybe add retry logic for failed remote data access #1219

jpivarski opened this issue May 14, 2024 · 0 comments
Labels
feature New feature or request

Comments

@jpivarski
Copy link
Member

jpivarski commented May 14, 2024

It wouldn't work for some failures, and that's the hard part: deciding when retrying is likely to work and when it's doomed and you're just extending the time needed before getting a useful error message to the user.

Here's the original #1217 (comment), which has some notes about how it could be implemented.


If this were to be done in Uproot, we'd have to dynamically adjust the request size and resubmit when we see such an error, and have some reasonable give-up policy after a specified number of retries to keep it from spiraling out of control. And if the OSError came from a local file, rather than XRootD, it's not retryable (if an open local file suddenly can't be read, someone must have unplugged the disk or something).

Possibly the smallest sections that would need to be retried are:

in HasBranches.arrays:

arrays, expression_context, branchid_interpretation = _regularize_expressions(
self,
expressions,
cut,
filter_name,
filter_typename,
filter_branch,
keys,
aliases,
language,
get_from_cache,
)
ranges_or_baskets = []
checked = set()
for _, context in expression_context:
for branch in context["branches"]:
if branch.cache_key not in checked:
checked.add(branch.cache_key)
for (
basket_num,
range_or_basket,
) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
ranges_or_baskets.append((branch, basket_num, range_or_basket))
interp_options = {"ak_add_doc": ak_add_doc}
_ranges_or_baskets_to_arrays(
self,
ranges_or_baskets,
branchid_interpretation,
entry_start,
entry_stop,
decompression_executor,
interpretation_executor,
library,
arrays,
False,
interp_options,
)

in HasBranches.iterate:

ranges_or_baskets = []
checked = set()
for _, context in expression_context:
for branch in context["branches"]:
if branch.cache_key not in checked:
checked.add(branch.cache_key)
for (
basket_num,
range_or_basket,
) in branch.entries_to_ranges_or_baskets(
sub_entry_start, sub_entry_stop
):
previous_basket = previous_baskets.get(
(branch.cache_key, basket_num)
)
if previous_basket is None:
ranges_or_baskets.append(
(branch, basket_num, range_or_basket)
)
else:
ranges_or_baskets.append(
(branch, basket_num, previous_basket)
)
arrays = {}
interp_options = {"ak_add_doc": ak_add_doc}
_ranges_or_baskets_to_arrays(
self,
ranges_or_baskets,
branchid_interpretation,
sub_entry_start,
sub_entry_stop,
decompression_executor,
interpretation_executor,
library,
arrays,
True,
interp_options,
)

in Branch.array:

arrays = {}
expression_context = []
branchid_interpretation = {}
_regularize_branchname(
self,
self.name,
self,
interpretation,
get_from_cache,
arrays,
expression_context,
branchid_interpretation,
True,
False,
)
ranges_or_baskets = []
checked = set()
for _, context in expression_context:
for branch in context["branches"]:
if branch.cache_key not in checked and not isinstance(
branchid_interpretation[branch.cache_key],
uproot.interpretation.grouped.AsGrouped,
):
checked.add(branch.cache_key)
for (
basket_num,
range_or_basket,
) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
ranges_or_baskets.append((branch, basket_num, range_or_basket))
interp_options = {"ak_add_doc": ak_add_doc}
_ranges_or_baskets_to_arrays(
self,
ranges_or_baskets,
branchid_interpretation,
entry_start,
entry_stop,
decompression_executor,
interpretation_executor,
library,
arrays,
False,
interp_options,
)

Maybe it's possible to retry only the _ranges_or_baskets_to_arrays call, but I'm not sure if there would be a counting problem if the arrays dict and ranges_or_baskets list are not reinitialized after a partial failure. (These are notes for implementation.) If the number of items in the arrays dict is wrong, the code will hang in this loop, rather than raise an error!

There would also need to be a way to change the granularity of the XRootD request while still requesting all the data the user wants. The Source.chunks method doesn't have a way to express that, but maybe the coalesce algorithm does? Unfortunately, coalesce arguments are specified per-FSSpecSource object, but maybe each retry could be a new, more finely granular FSSpecSource? (We don't know if another thread is using the same FSSpecSource object; we can't change it in place.)

Originally posted by @jpivarski in #1217 (comment)

@jpivarski jpivarski added the feature New feature or request label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant