Skip to content

Conversation

@Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Oct 30, 2025

Related to internal issue (discussed in internal slack). It seems that forking a httpx.Client might have some weird side effects (causing a '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2580)'). Solution is to set a callback to close the global HTTP session in the child fork at fork creation.

cc @lhoestq @andimarafioti

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lhoestq
Copy link
Member

lhoestq commented Oct 30, 2025

This fixes the errors in the script which replicates @andimarafioti 's training

from datasets import load_dataset
from torch.utils.data import DataLoader
import multiprocessing


multiprocessing.set_start_method("fork")

dataset = load_dataset(
    "HuggingFaceM4/FineVisionMax",
    split="train",
    streaming=True,
    filters=[("relevance_min", "<", -9999)]  # skip all for the test
)

if __name__ == "__main__":
    dataloader = DataLoader(dataset, num_workers=6)
    for x in dataloader:
        pass
    print("done")

Without this fix multiple errors could occur:

'[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2580)'
huggingface_hub.errors.HfHubHTTPError: Client error '416 Requested Range Not Satisfiable'
pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer.

In the last two it could happen that the request to get the Parquet footer would hit another file that the requested one somehow

@Wauplin Wauplin marked this pull request as ready for review October 30, 2025 15:08
@Wauplin
Copy link
Contributor Author

Wauplin commented Oct 31, 2025

Great to know! We can merge it now then 😃 Failing CI is unrelated.

@Wauplin Wauplin merged commit aac0c12 into main Oct 31, 2025
9 of 22 checks passed
@Wauplin Wauplin deleted the close-session-on-fork branch October 31, 2025 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants