URLIngest - Web Content Aggregation Suite

A comprehensive toolkit for web content aggregation, analysis, and preparation for Large Language Models (LLMs).

Features

Core Functionality

Multi-Format Ingestion
- Drag & Drop URL management
- Bulk content parsing
- Pattern-based URL generation (e.g., page-[number])
Content Processing
- Text-only extraction with smart filtering
- Raw HTML source code retrieval
- Deep parsing with configurable depth
Output Management
- Individual or bulk content copying
- Editable link management
- Real-time parsing status

Specialized Tools

Research Assistant
- ArXiv paper discovery & summarization
- Multi-level explanation styles
- Literature review generation
Security Scanner
- Automated vulnerability detection
- Code analysis through LLMs
- Progressive scanning controls

Getting Started

Prerequisites

Modern web browser (Chrome 90+, Firefox 88+, Edge 90+)
Internet connection

Installation

git clone https://github.com/seenmttai/urlingest.git
cd urlingest
python3 -m http.server 8000

Access via:

Live Website: https://urlingest.pages.dev
Local file: file:///path/to/index.html
Local server: http://localhost:8000

Usage

Basic Workflow

Add URLs via drag & drop or manual input
Choose processing mode:
- Text Mode: Clean content extraction
- Source Mode: Raw HTML inspection
Configure filters and parsing depth
Parse and export content to clipboard

Research Tools

ArXiv Assistant

150+ research categories
Automatic paper discovery
Adaptive summarization:
- Technical deep dives
- Layman explanations
- Literature reviews

Security Scanner

Automated vulnerability detection
Real-time code analysis
Progressive result reporting

Contributing

We welcome contributions! Please follow these guidelines:

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgments

Inspired by Gitingest
CORS proxy services
Cloudflare worker and pages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
LICENSE		LICENSE
Research.html		Research.html
github.html		github.html
index.html		index.html
readme.md		readme.md
script.js		script.js
styles.css		styles.css
vulnerability-scanner.html		vulnerability-scanner.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

URLIngest - Web Content Aggregation Suite

Features

Core Functionality

Specialized Tools

Getting Started

Prerequisites

Installation

Usage

Basic Workflow

Research Tools

ArXiv Assistant

Security Scanner

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

seenmttai/URLIngest

Folders and files

Latest commit

History

Repository files navigation

URLIngest - Web Content Aggregation Suite

Features

Core Functionality

Specialized Tools

Getting Started

Prerequisites

Installation

Usage

Basic Workflow

Research Tools

ArXiv Assistant

Security Scanner

Contributing

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages