A text-to-ambient-sound generator powered by AudioLDM2
Type a description like "gentle rain on a window with distant thunder" and get a generated ambient soundscape.
Live Demo on Hugging Face Spaces →
This is a learning-in-public project documenting my journey into Generative AI for Audio. I'm building a text-to-ambient-sound app while studying the underlying models, papers, and techniques.
Features:
- 🎧 Quick Generate — 8 hand-crafted ambient presets, one click to generate
- 🎛️ Layer Mixer — generate up to 3 sound layers and mix them with volume control
- ✍️ Custom — write your own prompts with adjustable guidance scale and inference steps
Follow the build process on the project blog:
- What is AudioLDM2 and why I'm using it
- First Sounds — What AudioLDM2 Can and Can't Do
- Prompt Engineering for Audio — What Actually Works
- Building AmbientGen — From Notebook to Product
A curated list of papers and resources on generative AI for audio and music → Papers & Resources
- Model: AudioLDM2 via HuggingFace Diffusers
- Interface: Gradio
- Deploy: Hugging Face Spaces (ZeroGPU)
- Blog: GitHub Pages + Jekyll
ambientgen/
├── blog/ # Blog posts (Markdown)
├── app/ # Gradio application code
├── experiments/ # Colab notebooks & experiment logs
├── docs/ # Papers reading list & resources
└── README.md
MIT