-
-
Notifications
You must be signed in to change notification settings - Fork 232
Description
Is your feature request related to a problem? Please describe.
There are a few cases where the indexer cannot correctly create a CDX file from a WARC file. There are, for example, #44 and #168, reported here, which have valid workarounds.
The problem I have here is that I would very much like to fix the problem, but it occurred only after indexing many WARC files. I added about two dozen of those to the collection, and now it's giving this error, without any more information:
Invalid WARC record, first line:
Describe the solution you'd like
The indexer should catch that error and report which file triggered it so it can be fixed correctly.
Describe alternatives you've considered
I've considered creating a new collection and running wb-manager add
on each WARC file one by one so I could tell which one is triggering the problem. But add
is designed to support adding multiple files at once, so it should also report errors accordingly.
Additional context
This is part many issues found when using pywb with larger collections, see #408 and #410.