Integrated tools to transfer internet audio to text, extract unpopular views, and pick up podcasts for you.
Pickpod
helps to build your private wiki efficiently.
This repository contains:
-
A
Python
package that can easily call specified tasks. -
A
Streamlit
app that provides a web UI to manage your podcast library. -
Several package usage examples of complete tasks for target audio.
Welcome to our commercial deployment: Pickpod, implementation with Java
and microservice architecture.
Compared to the personal open-source prototype in this repository, the commercial version provides powerful performance and stable services.

The goals for Pickpod
are:
-
High-quality integration with yt-dlp, faster-whisper, and pyannote-audio, so that users can quickly obtain the text result of the corresponding audio transcription by simply inputting a link or a local file.
-
The convenient use of LISTEN NOTES Podcast API and Claude API. After completing the necessary settings and making a task,
Pickpod
can get the list of podcasts the users are interested in regularly according to the specified release period. Thus, the transcription task can be completed in batch. Then,Pickpod
can pick up podcasts based on the evaluation through the extracted keywords, summaries, views, or only theLLM
. Users can reference and modify the recommendation according to the sorting results of podcasts. -
Rapid deployment for local environments, so that when the user launches the project, all features are easily accessible in the browser.
Since ffmpeg and ffprobe are strongly recommended by yt-dlp, it is necessary to install the ffmpeg binary
within the system before installing Pickpod
.
You can refer to the installation method provided by pydub, or go to the ffmpeg download page and ffmpeg compilation guide for more.
Moreover, please see the note about hugging face
access token fetching in pyannote-audio for more information on using speaker-diarization.
If you need to filter the list of podcasts to be batch transcribed based on customized rules or use LLM
to analyze the transcribed text, please refer to the API
documentation provided by Listen Notes and Anthropic to obtain the necessary Access Keys
, respectively.
Due to Pickpod
strictly restricting the version of used Python
packages, some packages may automatically solve conflicts and remove some of the packages that you have installed before. To avoid unnecessary conflicts or damage to your environment, we strongly recommend installing Pickpod
in a brand new Python
environment or a Python
virtual environment.
You don't need this source code if you just want to use the package. Just run:
$ pip install --upgrade pickpod
If you want to modify the package, install from source with:
$ pip install ./pickpod
If you want to run the Streamlit
app that provides a web UI, install from source with:
$ pip install -r ./pickpod/app/requirements.txt
$ # For Linux or Unix
$ streamlit run ./pickpod/app/Home.py --server.port 8051
$ # For Windows
$ python -m streamlit run ./pickpod/app/Home.py --server.port 8051
Then visit http://127.0.0.1:8051
in your local browser.
We chose nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 as a typical system environment to try to install Pickpod
. The docker image has the following base configuration:
$ python3 -V
Python 3.10.12
$ nvidia-smi
Tue Aug 15 08:06:56 2023
+-----------------------------------------------------------------------------+
NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | MIG M. |
|===============================+======================+======================|
0 NVIDIA GeForce ... On | 00000000:65:00.0 Off | N/A |
0% 43C P8 23W / 370W | 1481MiB / 24576MiB | 0% Default |
| | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
Processes: |
GPU GI CI PID Type Process name GPU Memory |
ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
First, we need to install ffmpeg
, python3-pip
, and other essential tools, then upgrade the software packages.
$ sudo apt-get -y install cmake libsndfile1 ffmpeg python3-pip
$ sudo apt update && apt upgrade -y
We can verify if ffmpeg
is installed successfully in the following way:
$ ffmpeg -version
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
After downloading the source code and running setup.py
, we can import Pickpod
in Python
.
$ git clone https://github.com/shixiangcap/pickpod.git
$ pip install ./pickpod
from pickpod.config import TaskConfig
from pickpod.draft import AudioDraft
from pickpod.task import PickpodTask
HUGGING_FACE_KEY = "YOUR_HUGGING_FACE_KEY"
# For example: https://www.youtube.com/watch?v=xxxxxxxxxxx
audio_url = "YOUR_AUDIO_URL_ON_INTERNET"
# Set audio information
audio_draft = AudioDraft(audio_url=audio_url)
# Config pickpod task
task_config = TaskConfig(key_hugging_face=HUGGING_FACE_KEY, pipeline=True)
# Initial pickpod task
pickpod_task = PickpodTask(audio_draft, task_config)
# Start pickpod task
pickpod_task.pickpod_with_url()
# Print the result of pickpod task
print(pickpod_task.__dict__)
from pickpod.config import TaskConfig
from pickpod.draft import AudioDraft
from pickpod.task import PickpodTask
HUGGING_FACE_KEY = "YOUR_HUGGING_FACE_KEY"
# For example: xxxxxxxxxxx.m4a
audio_path = "YOUR_LOCAL_FILE_PATH"
# Set audio information
audio_draft = AudioDraft(audio_path=audio_path)
# Config pickpod task
task_config = TaskConfig(key_hugging_face=HUGGING_FACE_KEY, pipeline=False)
# Initial pickpod task
pickpod_task = PickpodTask(audio_draft, task_config)
# Start pickpod task
pickpod_task.pickpod_with_local()
# Save the result of pickpod task
pickpod_task.save_to_txt()





You can search podcasts in your Gallery
.

If you need Pickpod
to pick up podcasts based on your private wiki, the prompt will use both valuable and worthless views.


If the target YouTube
video is Introducing GPT-4, the Pickpod
can get the JSON
file afeb5810-25ee-426d-aa88-7b58484d4c6f.json
If the target 小宇宙
podcast is EP 35. ICML现场对话AI研究员符尧:亲历AI诸神之战,解读LLM前沿研究,Llama 2,AI Agents, the Pickpod
can get the JSON
file 93aa3140-300d-4af6-9d9c-2c41e9095821.json
-
yt-dlp - A youtube-dl fork with additional features and fixes.
-
faster-whisper - Faster Whisper transcription with CTranslate2.
-
pyannote-audio - Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding.
Feel free to dive in! Open an issue or submit PRs.
MIT © shixiangcap