Skip to content

Bug: YouTube transcript ingestion hits 429 (Too Many Requests) #206

@juananpe

Description

@juananpe

Summary

YouTube timedtext (subtitles) requests during transcript ingestion sometimes return 429 Too Many Requests because the ingest plugin performs rapid sequential fetches with no retry/backoff and no configurable inter-request delay.

Steps to reproduce

  1. Submit multiple videos for transcript ingestion in rapid succession (e.g., 10–20 videos with ~5–15 seconds between submissions).
  2. Observe ingestion jobs failing with 429 Client Error: Too Many Requests for url: https://www.youtube.com/api/timedtext?....
  3. Confirm subtitle download path uses yt-dlp to find subtitles/automatic_captions and then requests.get(subtitle_url) to fetch the VTT.

I think that we just need to ingest videos using some kind of queue, one by one, with a backoff mechanism (otherwise we will trigger the 429 error)

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions