Skip to content

Latest commit

 

History

History
540 lines (467 loc) · 26.3 KB

readme.md

File metadata and controls

540 lines (467 loc) · 26.3 KB

🚀 AI-Creator: Multi-Modal Agents for Video Production

One Prompt to Final Cut: AI-Powered Video Creation Unleashed
一句话,一键成片:AI助你实现创意自由

     

🎉 News

  • [2024.04.24] 🎯 Plan to upgrade quality for music videos, providing perfect rythm!
  • [2024.04.24] 📢 Achieve more precise visual perception during storyboarding and editing to capture specific characters frames and understand source video.
  • [2024.04.14] 📢 Update the usage and codes of AI-Creator!
  • [2024.04.08] 📢 Our demo videos have received over 100k views and 19k likes! Watch them on bilibili !
  • [2024.04.07] 📢 Releasing three more interesting demo videos made by AI-Creator on our bilibili account!
  • [2024.04.07] 📢 Releasing the video retrieval implementation code of AI-Creator! Supporting all the demo video types showcased!
  • [2024.03.31] 📢 Releasing the audio files of AI-Creator (Meme Videos, AI Music Videos, English Talk Show to Chinese Crosstalk Conversion)!
  • [2024.03.31] 📢 Releasing the audio implementation code of AI-Creator!
  • [2025.03.31] 📢 Releasing the first demo videos! Including Movie Edits, Meme Videos, AI Music Videos, English Talk Show to Chinese Crosstalk Conversion, AI-Generated TV Drama Clips, and Tech News Updates

🎥 Demos & How We Made Them

We have made videos of six distinct types using our AI-Creator, including:

Movie Edits Meme Videos Music Videos
Verbal Comedy Arts TV Drama News Summarization

Note: All videos are used for research and demonstration purposes only. The audio and visual assets are sourced from the Internet. Please contact us if you believe any content infringes upon your intellectual property rights.

1. Movie Edits

Ever dreamed of creating stunning movie edits that captivate your audience? With AI-Creator, you can transform your favorite movie clips into breathtaking montages that tell your unique story, complete with perfectly synchronized music and transitions.

🚀 How AI-Creator Makes Movie Edits

  • Users just need to prepare the video sources, music audio file and the idea they want.
  • Automatically extract music rhythm points (optional) setting threshold & mask parameters.
  • Automatically assists with storyboard query design through video content based on user-provided ideas.
  • Automatically complete the editing and integration of the video.

1.1 Spider-Man: Across the Spider-Verse

🌟 Key Features:

  • Perfect sync between visuals and background music rhythm (eg. 13s, 22s, 25s)
  • Expert capture of high-energy scenes (<1 min) from the full movie (>2 hours)
  • Maintain visual continuity and reduce redundant clips
  • Accurately align the storyboard description of the user prompt (eg. 1st, 2nd sections)

Through intelligent analysis of hours of film footage, AI-Creator automatically identifies rhythm cues, high-energy action scenes, and character highlights to achieve precise editing.

📝 Prompt:

Begin with Gwen with blonde hair sitting at a dining table in front of a window, followed by her playing drums with pop textures and notes in the background. Include action scenes featuring Miguel O'Hara in his dark blue suit with red accents, sharp red claws and black/red eye lenses, Spider-Gwen in her white and pink suit with hood and ballet shoes, Miles Morales with curly hair and red spider logo on his chest, and The Spot in his black suit covered in white spots using portal powers. Focus on the chase scene in the blue sky with trains, and emphasize quality motion such as web-swinging, fighting, and colorful special effects throughout the sequence.

1.2 Interstella

       

🌟 Key Features:

  • For the same input video, edit different styles by adapting your prompts

We showcase two distinct edits of Interstella created using AI-Creator. The first version focuses on the theme "love transcending space and time," while the second emphasizes humanity's courage in space exploration. Both edits demonstrate how different prompts can shape the narrative and emotional impact of the same source material.

📝 Prompt:

Version 1:
Love can transcend time and space.
Version 2:
Celebrate humanity's courage in space exploration. Include scenes featuring spaceships, wormholes, black holes, space station docking maneuvers, ocean planets, and glacial worlds. Show astronauts in their distinctive white spacesuits as they venture into the unknown, highlighting mankind's relentless drive to explore the cosmos.

1.3 Nezha

🌟 Key Features:

  • Capturing scences of conflicts and battles

📝 Prompt:

Capture more scenes of conflicts and battles between Nezha and Shen Gongbao (black-robed), Dragon Prince Ao Bing (blue-robed).

1.4 Titanic

🌟 Key Features:

  • Understanding of romantic scenes

📝 Prompt:

A romantic and sweet love story about Jack and Rose meeting on the Titanic. It cannot include the part where the ship is in distress, nor the night scene. In the first section, Rose, wearing a purple hat and a white shirt, walks out of a white car with a purple umbrella, looking thoughtfully.

2. Meme Videos

Want to create engaging and hilarious meme videos? AI-Creator helps you craft memorable content by intelligently combining video clips, text, and effects into shareable content that could go viral.

🚀 How AI-Creator Makes Meme Videos

  • Users just need to provide the video path and your requirements.
  • Automatically preprocesses audio (voice separation, loudness normalization, resampling, transcription) with corresponding agents
  • Automatically segments the audio and performs segment-level copywriting adaptation via the Writer Agent
  • Uses the Infer Agent for zero-shot inference on audio segments
  • Aligns and merges audio-visual content automatically with the Combiner Agent

🌟 Key Features:

  • Intelligent understanding and transformation of meme concepts
  • Precise audio synthesis and precise scene matching

2.1 Master Ma as AI Researcher

Master Ma as AI Researcher Original Video of Master Ma

📝 Prompt:

Create a humorous narrative about two PhD students seeking advice from Master Ma. For the two PhD students, one of them is known for high citation counts and the other for numerous publications. Transform martial arts terms into AI research terminology while keeping phrase lengths similar (length difference should be less than two Chinese characters). The story highlights their academic rivalry and ends with Master Ma advising against "窝里斗" (internal competition). Keep signature phrases like "大意了没有闪" (wasn't cautious enough) and "四两拨千斤" (achieving great results with minimal effort) while avoiding mentions of real institutions. The word combinations should be logical and appropriate for an academic context.

2.2 Xiao-Ming-Jian-Mo(小明剑魔) Meme

Video 1: Mixue's Response Video 2: Find Your Own Problems
Video 3: MVP Video 4: Original 小明剑魔 Video

The 小明剑魔 meme has gained massive popularity recently through his insightful yet comedic streaming commentary. Many content creators have successfully adapted his distinctive speech pattern into creative videos. We've used AI-Creator to generate three videos of this viral meme format, each capturing the unique style and energy of the original while adding new creative elements.

📝 Prompts:

Video 1:
Background: Mixue Ice Cream is a national chain brand focusing on ice cream and tea beverages. On March 15th (Consumer Rights Day), they were reported to be using overnight lemons. However, compared to other exposures, using overnight lemons isn't considered a particularly serious violation and is somewhat understandable.

- Speaker: Snow King (Mixue's representative)
- Purpose: Emphasize that the **overnight lemon** situation isn't too serious, highlighting Mixue's good reputation
- Must preserve the phrases "Look in my eyes tell me why why baby why", "回答我"
- Must end with the word "说话"
- If the original text contains awkward phrasing, such as redundant words or confused semantics, don't imitate that style or sentence structure
- Ensure natural and fluent sentences
Video 2:
Based on the following scenario, create an angry rebuttal from Zhuge Liang (me):
- Speaker: Zhuge Liang (me)
- Start with "**北伐失败怎么不找找自己问题**" (Why don't you look at your own problems for the failure of the Northern Expedition), followed by "...找自己问题" pattern sentences that **all** reference anime events
- Anime examples must mention specific characters
- Only the **last** "...找自己问题" should return to the Northern Expedition scenario
- Use colloquial language and diverse anime references
Video 3:
Based on the following scenario, create an angry rebuttal from Zhuge Liang (me):
- Speaker: Zhuge Liang (me)
- Zhuge Liang (me) is challenged about why a certain Three Kingdoms character has a higher rating than him and launches a fierce rebuttal
- Must include: "三点零、十三点零、躺赢狗"
- Do not start with "零杠几"
- Later rating comparisons should show stark differences (can be exaggerated)
- Use colloquial language, align with historical facts, and only replace specific content

3. AI Music Videos

Ready to create music videos realizing your creative ideas? AI-Creator helps you write lyrics, select singers you specify, and generate matching visuals to bring your musical vision to life. The system can coordinate lyrics, visuals, and music to create engaging amateur music videos.

🚀 How AI-Creator Makes Music Videos

  • Users just need to provide the music MIDI file, original lyrics, BGM file (optional), target voice file, and requirements.
  • Automatically performs loudness normalization and annotates the MIDI file using the Annotator Agent.
  • Automatically calibrates and adapts lyrics at the word level via the Analyzer Agent.
  • Automatically divides long rest intervals to reduce melodic errors and enables song covers.

🌟 Key Features:

  • Automated lyric generation based on themes
  • Intelligent matching of visuals and lyrics

📝 Prompts:

The song is performed by Patrick Star, focusing on the theme of ​**"the struggles of manuscript submission and dealing with overly critical reviewers"**, following the original lyrics' sentence structure while replacing specific content. It incorporates elements of reviewer nitpicking (e.g., questioning innovation, demanding redundant experiments) and expresses frustration with lines like "If only I could swap reviewers, this academic fate is too cruel" to highlight the emotional toll of peer review.

4. Cross-Culture Verbal Comedy Arts

Interested in bridging cultural gaps through comedy? Transform popular English talk show segments into authentic Chinese crosstalk performances, and vice versa. Complete with cultural adaptations and localized humor that resonates with audiences of different culture backgrounds.

🚀 How AI-Creator Makes Talk Show Transitions

  • Users just need to provide the target cross-talk (comedy dialogue) audio file.
  • Automatically adapts the script based on the provided target audio file.
  • Automatically selects the appropriate vocal tone for voice cloning according to the emotional context of the script.

🌟 Key Features:

  • Cultural context adaptation and localization of humor
  • Performance style transformation while preserving core comedic elements
  • Voice generation

4.1 English Stand-up Comedy to Chinese Crosstalk

Chinese Crosstalk Adaptation Original Stand-up Comedy Segment

4.2 Chinese Crosstalk to English Stand-up Comedy

Stand-up Comedy Adaptation Original Chinese Crosstalk Segment

5. Novel-to-Screen Adaptation

Want to bring your favorite novels to life? AI-Creator transforms written narratives into compelling video adaptations, complete with AI-generated scenes, characters, and dialogues - all without the need for actual filming or actors. Experience your beloved stories in a whole new medium.

🚀 How AI-Creator Makes Novel-to-Screen Adaption

  • Users just need to provide their idea, novel/book txt file and film/tv series video source files they want as visual materials.
  • (Optional) Provide favorite commentary audio files for cloning.
  • (Optional) Customized presentation style txt file.
  • Automatically write copy based on the provided novel/book text content.
  • Automatically complete video material splicing and audio integration.

🌟 Key Features:

  • Transforming novel narratives into visual storytelling by adapting descriptive text into cinematic scenes
  • Automated scene matching that pairs textual descriptions with appropriate visual elements, ensuring narrative coherence
  • Audio generation

We used AI-Creator to generate a video adaptation of the opening chapters from Joy of Life. Our agents analyzed the novel's text and automatically created a compelling video sequence by intelligently selecting and arranging relevant scenes from the TV series.

📝 Prompt:

Write fluent commentary script with 1500 words.

6. Tech News Summarization Updates

Want to create engaging tech news videos? AI-Creator helps transform complex technical updates into visually appealing content with dynamic graphics and clear explanations that keep viewers informed and engaged.

🚀 How AI-Creator Makes News Summary Videos

  • Users only need to provide their idea and the interview/news source files they want to summarize.
  • (Optional) Provide favorite audio files for cloning.
  • (Optional) Customize presentation style txt file.
  • Automatically transcribe the interview content through voice and extract key information of people/events to write news summaries.
  • Automatically complete video material splicing and audio integration.

6.1 Tech News: OpenAI's GPT-4o Image Generation Release

Tech News made by AI-Creator Original Tech Report

🌟 Key Features:

  • Automated news content summarization
  • Accurate audio and video clip alignment (eg. 1:00 Generate a comic about relativity, 1:09 Generate a trade card image including the dog Sanji)
  • Audio generation

📝 Prompt:

Short tech news, colloquial expression within 250 words, check the accuracy of key terms, e.g. the GPT model name should be 4o instead of 4.0

6.2 Dune 2 Movie Cast Update Interview

Podcast Summarization About Dune 2 Cast Original Podcast with Dune 2 Cast

🌟 Key Features:

  • Automated news content summarization
  • Accurate audio and main characters video clip alignment (eg. 00:13 Timothée Chalamet, 00:28 Zendaya, 00:38 Romance, 00:47 Florence Pugh, 00:56 Austin Butler, 01:06 Javier Bardem)
  • Audio generation

📝 Prompt:

Short movie podcast, colloquial expression within 300 words, notice to identify which actor or host is talking, don't mention movie tickets available issue.

Acknowledgements

We would like to express our deepest gratitude to the numerous individuals and organizations that have made AI-Creator possible. This project stands on the shoulders of giants, benefiting from the collective wisdom of the open-source community and the groundbreaking work of AI researchers worldwide.

First and foremost, we are indebted to the open-source community and AI service providers whose tools and technologies form the foundation of our work:

Our work has been significantly enriched by the creative contributions of content creators across various platforms:

  • The talented creators behind the original video content we used for testing and demonstration
  • The comedy artists whose work inspired our cross-cultural adaptations
  • The filmmakers and production teams behind the movies and TV shows featured in our demos
  • The content creators who have shared their knowledge and insights about video editing techniques

All content used in our demonstrations is for research purposes only. We deeply respect the intellectual property rights of all content creators and welcome any concerns or feedback regarding content usage.

Usage

中文文档

Clone and Install

git clone https://github.com/HKUDS/AI-Creator.git
conda create --name aicreator python=3.10
conda activate aicreator
conda install -y -c conda-forge pynini==2.1.5 ffmpeg
pip install -r requirements.txt

Model Download

# Download CosyVoice
cd tools/CosyVoice
huggingface-cli download PillowTa1k/CosyVoice --local-dir pretrained_models
# Download fish-speech
cd tools/fish-speech
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
# Download seed-vc
cd tools/seed-vc
huggingface-cli download PillowTa1k/seed-vc --local-dir checkpoints
# Download DiffSinger
cd tools/DiffSinger
huggingface-cli download PillowTa1k/DiffSinger --local-dir checkpoints
# Download MiniCPM
cd tools
git lfs clone https://huggingface.co/openbmb/MiniCPM-V-2_6-int4
# Download Whisper
cd tools
git lfs clone https://huggingface.co/openai/whisper-large-v3-turbo
# Download all-MiniLM-L6-v2
cd tools
git lfs clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
# Download ImageBind
cd tools
mkdir .checkpoints
cd .checkpoints
wget https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth
🌟Multiple models are available for your convenience; you may wish to download only those relevant to your project.
Feature Type Video Demo Required Models
Cross Talk English Stand-up Comedy to Chinese Crosstalk CosyVoice, MiniCPM, Whisper, ImageBind, all-MiniLM-L6-v2
Talk Show Chinese Crosstalk to English Stand-up Comedy CosyVoice, MiniCPM, Whisper, ImageBind, all-MiniLM-L6-v2
MAD TTS Xiao-Ming-Jian-Mo(小明剑魔) Meme fish-speech
MAD SVC AI Music Videos DiffSinger, seed-vc, MiniCPM, Whisper, ImageBind, all-MiniLM-L6-v2
Rhythm Spider-Man: Across the Spider-Verse MiniCPM, Whisper, ImageBind, all-MiniLM-L6-v2
Comm Novel-to-Screen Adaptation MiniCPM, Whisper, ImageBind, all-MiniLM-L6-v2
News Tech News: OpenAI's GPT-4o Image Generation Release MiniCPM, Whisper, ImageBind, all-MiniLM-L6-v2

LLM Config

# AI-Creator\environment\config\config.yml 
# Input your LLM API
llm:
  api_key:
  base_url: 

Model names may vary depending on the LLM base URL. The correct model names must be specified in environment/config/llm.py. For direct APIs that support only a single model (e.g., the official GPT series), all model names should be replaced with the corresponding supported model (e.g., gpt-4o-mini)

Input Config

# Configure the input of cross talk videos in advance (rhythm_agent/news_agent/comm_agent/cross_talk/mad_svc/mad_tts).yml files
# eg. AI-Creator\environment\config\cross_talk.yml
cross_talk:
  reqs: 'Generate a Chinese crosstalk (Xiangsheng) script. The story should be based on objectively existing situations, set against a Chinese background, avoiding examples from other countries. The script should be approximately 40-50 sentences long.'
  audio_path: 'dataset/cross_talk/英文脱口秀1.wav'
  dou_gen: 'dataset/cross_talk/郭德纲'
  peng_gen: 'dataset/cross_talk/付航'
  output: "dataset/user_output_video/cross_talk_video.mp4"
  video_source_dir: "dataset/user_video/"

Character Image for Visual Retrieval Enhancement

Under the dataset\video_edit\face_db, add images of the character to be recognized to enhance visual retrieval
The completed character images folders structure should look like this, notice that the name of the character folder must be the same as the character name (eg. Spiderman/Batman/Superman...):

face_db
├── Spiderman ── image01.png
└── Batman ── image02.png

Command Line Usage

# With the configuration now complete, proceed to run the following instructions:
python main.py
# The console will output:
Please describe the type of video you would like to produce: