Getting Started

Inverted_Index

implementation of an inverted index for text documents using NLTK

This Python script demonstrates the creation and usage of an inverted index for a collection of text documents. An inverted index is a data structure commonly used in information retrieval systems to efficiently store and retrieve text-based information.

Performs text preprocessing, including tokenization, punctuation removal, stopword removal, and lemmatization.
Uses the Natural Language Toolkit (NLTK) library for text processing tasks.

Getting Started

Prerequisites: Ensure you have Python and the NLTK library installed.
```
pip install nltk
```

Clone the Repository: Clone this repository to your local machine.

git clone https://github.com/your-username/text-inverted-index.git
cd text-inverted-index

Download NLTK Resources: Uncomment the required NLTK resource downloads in the code if they are not already downloaded. (Note: If you've already downloaded them, no action is needed.)
Replace Document Files: Place your text document files (e.g., doc1.txt, doc2.txt, etc.) in the designated directory.
Run the Script: Execute the script main.py to create an inverted index and perform a sample query.
```
python main.py
```
Customize and Query: Modify the run_query method in the InvIndex class to perform custom queries on the created inverted index.

Code Overview

main.py: The main script that imports the necessary modules, defines the InvIndex class, reads document files, creates an instance of InvIndex, and performs a sample query. funcs.py: Contains the functions for text preprocessing, such as punctuation removal and stopwords handling. docs/: A directory to place your text document files. Contributing Contributions are welcome! If you have ideas for improvements, feel free to open an issue or submit a pull request.

License This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
invert_func.py		invert_func.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inverted_Index

Getting Started

Code Overview

About

Uh oh!

Releases

Packages

Languages

License

EdenSwack/Inverted_Index

Folders and files

Latest commit

History

Repository files navigation

Inverted_Index

Getting Started

Code Overview

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages