Skip to content

Large collection of pdf files #85

@Elektrik00713

Description

@Elektrik00713

How can I feed a large collection of pdf files (about 1000) to Semantra? I'm doing this on WSL2/Ubuntu with a bash script, and when I try to parse through the entire collection, the embedding process completes correctly; but, when I launch the local web server and start typing a query, all I see is "Loading" and nothing else happens. In the CMD, the following two lines appear:
RuntimeWarning: Mean of an empty slice.
RuntimeWarning: Invalid value found during scalar divide

Now, using the bash script to batch the files into 100 at a time, I have no problems and can query the collection, but only for that 100 files.
Has anyone come across this before? How can I feed my collection of files all at once?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions