A local AI-powered tool that converts PDF documents into engaging podcasts, using local LLMs and TTS models.
- PDF text extraction and processing
- Customizable podcast generation with different styles and lengths
- Support for various LLM providers (OpenAI, Groq, LMStudio, Ollama, Azure)
- Text-to-Speech conversion with voice selection
- Fully configurable pipeline
- Preference-based content focus
- Programmatic API for integration in other projects
- FastAPI server for web-based access
- Example podcast included for demonstration
- Python 3.12+
- Local LLM server (optional, for local inference)
- Local TTS server (optional, for local audio generation)
- At least 8GB RAM (16GB+ recommended for local models)
- 10GB+ free disk space
pip install local-notebooklm
- Clone the repository:
git clone https://github.com/Goekdeniz-Guelmez/Local-NotebookLM.git
cd Local-NotebookLM
- Create and activate a virtual environment (conda works too):
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
- Install the required packages:
pip install -r requirements.txt
- Follow one installation type (docker, docker-compose, uv) at https://github.com/remsky/Kokoro-FastAPI
- Test in your browser that http://localhost:8880/v1 return the json: {"detail":"Not Found"}
The repository includes an example podcast in examples/podcast.wav
to demonstrate the quality and format of the output. The models used are: gpt4o and Mini with tts-hs on Azure. You can listen to this example to get a sense of what Local-NotebookLM can produce before running it on your own PDFs.
You can use the default configuration or create a custom JSON config file with the following structure:
{
"Co-Host-Speaker-Voice": "af_sky+af_bella",
"Host-Speaker-Voice": "af_alloy",
"Small-Text-Model": {
"provider": {
"name": "groq",
"key": "your-api-key"
},
"model": "llama-3.2-90b-vision-preview"
},
"Big-Text-Model": {
"provider": {
"name": "groq",
"key": "your-api-key"
},
"model": "llama-3.2-90b-vision-preview"
},
"Text-To-Speech-Model": {
"provider": {
"name": "custom",
"endpoint": "http://localhost:8880/v1",
"key": "not-needed"
},
"model": "kokoro",
"audio_format": "wav"
},
"Step1": {
"system": "",
"max_tokens": 1028,
"temperature": 0.7,
"chunk_size": 1000,
"max_chars": 100000
},
"Step2": {
"system": "",
"max_tokens": 8126,
"temperature": 1,
"chunk_token_limit": 2000,
"overlap_percent": 10
},
"Step3": {
"system": "",
"max_tokens": 8126,
"temperature": 1,
"chunk_token_limit": 2000,
"overlap_percent": 20
}
}
The following provider options are supported:
-
OpenAI: Use OpenAI's API
"provider": { "name": "openai", "key": "your-openai-api-key" }
-
Groq: Use Groq's API for faster inference
"provider": { "name": "groq", "key": "your-groq-api-key" }
-
Azure OpenAI: Use Azure's OpenAI service
"provider": { "name": "azure", "key": "your-azure-api-key", "endpoint": "your-azure-endpoint", "version": "api-version" }
-
LMStudio: Use a local LMStudio server
"provider": { "name": "lmstudio", "endpoint": "http://localhost:1234/v1", "key": "not-needed" }
-
Ollama: Use a local Ollama server
"provider": { "name": "ollama", "endpoint": "http://localhost:11434", "key": "not-needed" }
-
Google generative AI: Use Google's API
"provider": { "name": "google", "key": "your-google-genai-api-key" }
-
Anthropic: Use Anthropic's API
"provider": { "name": "anthropic", "key": "your-anthropic-api-key" }
-
Elevenlabs: Use Elevenlabs's API
"provider": { "name": "elevenlabs", "key": "your-elevenlabs-api-key" }
-
Custom: Use any OpenAI-compatible API
"provider": { "name": "custom", "endpoint": "your-custom-endpoint", "key": "your-api-key-or-not-needed" }
Run the script with the following command:
python -m local_notebooklm.start --pdf PATH_TO_PDF [options]
Option | Description | Default |
---|---|---|
--pdf |
Path to the PDF file (required) | - |
--config |
Path to custom config file | Uses base_config |
--format |
Output format type (summary, podcast, article, interview, panel-discussion, debate, narration, storytelling, explainer, lecture, tutorial, q-and-a, news-report, executive-brief, meeting, analysis) | podcast |
--length |
Content length (short, medium, long, very-long) | medium |
--style |
Content style (normal, casual, formal, technical, academic, friendly, gen-z, funny) | normal |
--preference |
Additional focus preferences or instructions | None |
--output-dir |
Directory to store output files | ./output |
Local-NotebookLM now supports both single-speaker and two-speaker formats:
Single-Speaker Formats:
- summary
- narration
- storytelling
- explainer
- lecture
- tutorial
- news-report
- executive-brief
- analysis
Two-Speaker Formats:
- podcast
- interview
- panel-discussion
- debate
- q-and-a
- meeting
Basic usage:
python -m local_notebooklm.start --pdf documents/research_paper.pdf
Customized podcast:
python -m local_notebooklm.start --pdf documents/research_paper.pdf --format podcast --length long --style casual
With custom preferences:
python -m local_notebooklm.start --pdf documents/research_paper.pdf --preference "Focus on practical applications and real-world examples"
Using custom config:
python -m local_notebooklm.start --pdf documents/research_paper.pdf --config custom_config.json --output-dir ./my_podcast
You can also use Local-NotebookLM programmatically in your Python code:
from local_notebooklm.processor import podcast_processor
success, result = podcast_processor(
pdf_path="documents/research_paper.pdf",
config_path="config.json",
format_type="interview",
length="long",
style="professional",
preference="Focus on the key technical aspects",
output_dir="./test_output"
)
if success:
print(f"Successfully generated podcast: {result}")
else:
print(f"Failed to generate podcast: {result}")
Start the FastAPI server to access the functionality via a web API:
python -m local_notebooklm.server
By default, the server runs on http://localhost:8000. You can access the API documentation at http://localhost:8000/docs.
- Extracts text from PDF documents
- Cleans and formats the content
- Removes irrelevant elements like page numbers and headers
- Handles LaTeX math expressions and special characters
- Splits content into manageable chunks for processing
- Generates an initial podcast script based on the extracted content
- Applies the specified style (casual, formal, technical, academic)
- Formats content according to the desired length (short, medium, long, very-long)
- Structures content for a conversational format
- Incorporates user-specified format type (summary, podcast, article, interview)
- Rewrites content specifically for better text-to-speech performance
- Creates a two-speaker conversation format
- Adds speech markers and natural conversation elements
- Optimizes for natural flow and engagement
- Incorporates user preferences for content focus
- Formats output as a list of speaker-text tuples
- Converts the optimized text to speech using the specified TTS model
- Applies different voices for each speaker
- Generates individual audio segments for each dialogue part
- Concatenates segments into a final audio file
- Maintains consistent audio quality and sample rate
flowchart TD
subgraph "Main Controller"
processor["podcast_processor()"]
end
subgraph "AI Services"
smallAI["Small Text Model Client"]
bigAI["Big Text Model Client"]
ttsAI["Text-to-Speech Model Client"]
end
subgraph "Step 1: PDF Processing"
s1["step1()"]
validate["validate_pdf()"]
extract["extract_text_from_pdf()"]
chunk1["create_word_bounded_chunks()"]
process["process_chunk()"]
end
subgraph "Step 2: Transcript Generation"
s2["step2()"]
read2["read_input_file()"]
gen2["generate_transcript()"]
chunk2["Chunking with Overlap"]
end
subgraph "Step 3: TTS Optimization"
s3["step3()"]
read3["read_pickle_file()"]
gen3["generate_rewritten_transcript()"]
genOverlap["generate_rewritten_transcript_with_overlap()"]
validate3["validate_transcript_format()"]
end
subgraph "Step 4: Audio Generation"
s4["step4()"]
load4["load_podcast_data()"]
genAudio["generate_speaker_audio()"]
concat["concatenate_audio_files()"]
end
%% Flow connections
processor --> s1
processor --> s2
processor --> s3
processor --> s4
processor -.-> smallAI
processor -.-> bigAI
processor -.-> ttsAI
%% Step 1 flow
s1 --> validate
validate --> extract
extract --> chunk1
chunk1 --> process
process -.-> smallAI
%% Step 2 flow
s2 --> read2
read2 --> gen2
gen2 --> chunk2
gen2 -.-> bigAI
%% Step 3 flow
s3 --> read3
read3 --> gen3
read3 --> genOverlap
gen3 --> validate3
genOverlap --> validate3
gen3 -.-> bigAI
genOverlap -.-> bigAI
%% Step 4 flow
s4 --> load4
load4 --> genAudio
genAudio --> concat
genAudio -.-> ttsAI
%% Data flow
pdf[("PDF File")] --> s1
s1 --> |"cleaned_text.txt"| file1[("Cleaned Text")]
file1 --> s2
s2 --> |"data.pkl"| file2[("Transcript")]
file2 --> s3
s3 --> |"podcast_ready_data.pkl"| file3[("Optimized Transcript")]
file3 --> s4
s4 --> |"podcast.wav"| fileAudio[("Final Audio")]
%% Styling
classDef controller fill:#f9d5e5,stroke:#333,stroke-width:2px
classDef ai fill:#eeeeee,stroke:#333,stroke-width:1px
classDef step fill:#d0e8f2,stroke:#333,stroke-width:1px
classDef data fill:#fcf6bd,stroke:#333,stroke-width:1px,stroke-dasharray: 5 5
class processor controller
class smallAI,bigAI,ttsAI ai
class s1,s2,s3,s4,validate,extract,chunk1,process,read2,gen2,chunk2,read3,gen3,genOverlap,validate3,load4,genAudio,concat step
class pdf,file1,file2,file3,fileAudio data
The pipeline generates the following files:
step1/extracted_text.txt
: Raw text extracted from the PDFstep1/clean_extracted_text.txt
: Cleaned and processed textstep2/data.pkl
: Initial transcript datastep3/podcast_ready_data.pkl
: TTS-optimized conversation datastep4/segments/podcast_segment_*.wav
: Individual audio segmentsstep4/podcast.wav
: Final concatenated podcast audio file
-
PDF Extraction Fails
- Try a different PDF file
- Check if the PDF is password-protected
- Ensure the PDF contains extractable text (not just images)
-
API Connection Errors
- Verify your API keys are correct
- Check your internet connection
- Ensure the API endpoints are accessible
-
Out of Memory Errors
- Reduce the chunk size in the configuration
- Use a smaller model
- Close other memory-intensive applications
-
Audio Quality Issues
- Try different TTS voices
- Adjust the sample rate in the configuration
- Check if the TTS server is running correctly
If you encounter issues not covered here, please:
- Check the logs for detailed error messages
- Open an issue on the GitHub repository with details about your problem
- Include the error message and steps to reproduce the issue
- Python 3.12+
- PyPDF2
- tqdm
- numpy
- soundfile
- requests
- pathlib
- fastapi
- uvicorn
Full requirements are listed in requirements.txt
.
- This project uses various open-source libraries and models
- Special thanks to the developers of LLaMA, OpenAI, and other AI models that make this possible
For more information, visit the GitHub repository.
Best Gökdeniz Gülmez
The Local-NotebookLM software suite was developed by Gökdeniz Gülmez. If you find Local-NotebookLM useful in your research and wish to cite it, please use the following BibTex entry:
@software{
Local-NotebookLM,
author = {Gökdeniz Gülmez},
title = {{Local-NotebookLM}: A Local-NotebookLM to convert PDFs into Audio.},
url = {https://github.com/Goekdeniz-Guelmez/Local-NotebookLM},
version = {0.1.5},
year = {2025},
}