One sentence: This repo provides a Node/TypeScript CLI for arXiv search, metadata fetch, downloads, category browsing, and URL output.
Last updated: 2026-01-07
- Doc requirements
- Prerequisites
- Quickstart
- Common tasks
- Risks and assumptions
- Troubleshooting
- Reference
- Acceptance criteria
- Evidence bundle
- Audience: Developers and researchers using the CLI to search, fetch, and download arXiv papers.
- Scope: Installation, core commands, verification steps, and usage constraints.
- Non-scope: Contribution workflow, security reporting, and internal architecture (see
CONTRIBUTING.md,SECURITY.md,docs/ADR-001-architecture.md). - Doc owner: jscraik.
- Review cadence: Each release.
- Required approvals: 1 maintainer.
- Required: Node.js 20+, npm
- Optional: Git, a POSIX shell
npm install
npm run buildnode dist/cli.js search "cat:cs.AI" --max-results 5Expected output:
- A list of results with IDs and titles.
Pro tip: install
@brainwav/rsearchand you can runrsearch search "cat:cs.AI" --max-results 5instead ofnode dist/cli.js.
- What you get: titles and IDs (plus URLs in JSON output).
- Steps:
node dist/cli.js search "cat:cs.LG" --max-results 10- Verify: output shows
Total resultsand a list of entries.
- What you get: only results that include license metadata in arXiv records.
- Steps:
node dist/cli.js search "cat:cs.AI" --require-license --max-results 10- Verify: summary mentions filtered results when license metadata is missing.
- What you get: full metadata including abstract, authors, and PDF URL.
- Steps:
node dist/cli.js fetch 2002.00762 --json- Verify: JSON includes
absUrlandpdfUrl.
- What you get: a PDF per ID in the output directory.
- Steps:
node dist/cli.js download 2002.00762 --out-dir ./papers- Verify:
./papers/2002.00762.pdfexists.
- What you get: Markdown or JSON output with text extracted from the PDF.
- Steps:
node dist/cli.js download 2002.00762 --format md --out-dir ./papers
node dist/cli.js download 2002.00762 --format json --out-dir ./papers- Verify:
./papers/2002.00762.mdor./papers/2002.00762.jsonexists.
- What you get: downloads fail if arXiv does not provide license metadata.
- Steps:
node dist/cli.js download 2002.00762 --format json --require-license --out-dir ./papers- Verify: failures are reported with
License metadata missingwhen unavailable.
- What you get: both the text export and the PDF.
- Steps:
node dist/cli.js download 2002.00762 --format md --keep-pdf --out-dir ./papers- Verify: both
2002.00762.mdand2002.00762.pdfexist.
- What you get: abstract and PDF URLs per result.
- Steps:
node dist/cli.js urls "cat:cs.AI"
node dist/cli.js urls --ids 2002.00762 2101.00001
node dist/cli.js urls "cat:cs.AI" --require-license- Verify: each line includes an abs URL and PDF URL.
- What you get: the arXiv category taxonomy.
- Steps:
node dist/cli.js categories tree
node dist/cli.js categories list --group "Computer Science"- Verify: group names and category IDs are listed.
- Assumes arXiv API availability and that users respect rate limits.
- Assumes users review license metadata before reuse; the CLI does not grant rights.
- PDF text extraction may fail on scanned or complex layouts.
- Output files overwrite only when
--overwriteis used; verify output paths before running batch downloads.
Cause:
- Missing positional argument or stdin input. Fix:
node dist/cli.js search "cat:cs.AI"
node dist/cli.js fetch 2002.00762Cause:
- Rate limiting or transient server errors. Fix:
- Re-run; the CLI already retries with backoff. Lower
--max-resultsif needed.
Cause:
- arXiv taxonomy endpoint unavailable or network blocked. Fix:
- Re-run later or use
--refreshonce connectivity is restored.
- Repo: https://github.com/jscraik/rSearch.git
- Commands:
search,fetch,download,urls,categories,config,help
- Constraints:
- Default API delay: 3s
- Retry defaults: max-retries=3, retry-base-delay=500ms (
--no-retryto disable) page-size<= 2000max-results<= 30000
- Output schema:
schemas/cli-output.schema.jsonschemas/cli-error.schema.json
- License use:
- arXiv content is licensed by the authors. The CLI may expose a license URL when provided, but it does not grant rights. Always verify permitted use on the arXiv abstract page.
- Usage policy:
- Be courteous to arXiv: include contact info (
--contact) and keep rate limits conservative (--rate-limit).
- Be courteous to arXiv: include contact info (
- Docs:
docs/index.mdCHANGELOG.mdSECURITY.mdSUPPORT.mdCODE_OF_CONDUCT.mdCONTRIBUTING.mddocs/cli-reference.mddocs/configuration.mddocs/release-policy.mddocs/troubleshooting.mddocs/faq.md
docs/roadmap.mddocs/ADR-001-architecture.md
- Doc requirements reflect current CLI scope and ownership.
- Examples match available commands and scripts in this repo.
- License and usage policy notes are present and accurate.
- Risks and assumptions are explicit and up to date.
- Links resolve to existing files or URLs.
- Standards mapping: CommonMark structure, accessibility (descriptive links), security/privacy guidance for license usage.
- Brand compliance: Documentation signature added; assets present in
brand/. - Automated checks: vale run on 2026-01-07 (0 errors, 0 warnings).
- Review artifact: Self-review completed on 2026-01-07.
- Deviations: None.
brAInwav
from demo to duty
