BookBot is a simple, yet robust, command-line utility built to perform foundational text analysis on any book (plain text file). It generates a comprehensive report detailing the total word count and the sorted frequency of every letter in the text.
Built as a project for the Boot.dev backend development curriculum.
You need Python 3 installed on your system.
-
Clone the Repository:
git clone [https://github.com/SanTheDeveloper/bookbot](https://github.com/SanTheDeveloper/bookbot) cd bookbot -
Run the Analyzer: Pass the path to your desired text file as a command-line argument.
# Analyze the provided example book (using main.py) python3 main.py books/frankenstein.txt # Example analyzing a different text file python3 main.py books/mobydick.txt
Note: If no file path is provided, the program validates the arguments, prints a usage message, and exits with a status code of
1.
The application follows a clean pipeline to transform raw text into a final report:
- Argument Validation: Checks for and retrieves the file path from
sys.argv. - File Reading: Performs safe File I/O to read the entire text into memory using a context manager.
- Data Preparation: Converts the entire text to lowercase and filters out non-alphabetical characters (spaces, symbols, punctuation) for the frequency analysis using the
.isalpha()string method. - Analysis:
- Calculates the total word count by splitting the text using
.split(). - Builds a dictionary of character frequencies using the
dict.get(key, 0)technique for efficient counting.
- Calculates the total word count by splitting the text using
- Reporting: Converts the frequency dictionary into a sorted list (descending by count) using a custom
sort_onkey function, and prints the final formatted report to the console.
This project was a practical exercise in applying foundational Python skills:
- Command-Line Arguments (
sys.argv): Dynamically accepting input from the terminal for flexible tool usage. - Dictionaries: Using dictionaries as hash maps for efficient, non-sequential data storage (tracking
characterβcount). - Lists of Dictionaries: Transforming analysis results into a structured, sortable format (e.g.,
[{'char': 'e', 'num': 46043}, ...]).
- File I/O: Best practices for reading text files using the
with open(...)context manager. - String Methods: Mastering
.split(),.lower(), and the conditional check.isalpha()for data preparation. - Custom Sorting Logic: Implementing custom sorting using the
list.sort(reverse=True, key=...)method to order character counts. - Error Handling: Implementing
try...exceptblocks andsys.exit(1)for robust command-line behavior.
The report structure is designed for clarity and quick insight:
============ BOOKBOT ============
Analyzing book found at books/frankenstein.txt...
----------- Word Count ----------
Found 75767 total words
--------- Character Count -------
e: 44538
t: 29493
a: 25894
... (All alphabetical characters sorted by count)
p: 5952
b: 4868
v: 3737
...
z: 235
============= END ===============
bookbot/
βββ books/ # Directory to store text files
β βββ frankenstein.txt # Example input book
βββ main.py # Main application script (handles I/O, argument parsing, report printing)
βββ stats.py # Core logic (get_word_count, get_char_count, sorting functions)
βββ README.md # Project documentation
This project is part of the Boot.dev curriculum. Feel free to use it as a learning resource!