A Python toolkit for generating video captions using the Lance database format and Gemini API for automatic captioning.
Now we support the qwen-VL series video caption model!
- qwen-vl-max-latest
- qwen2.5-vl-72b-instruct
- qwen2.5-vl-7b-instruct
- qwen2.5-vl-3b-instruct
qwen2.5-vl has 2 seconds ~ 10 mins, qwen-vl-max-latest has 1 min limit. These models are not good at capturing timestamps; it is recommended to use segmented video clips for captions and to modify the prompts.
Video upload feature requires an application to be submitted to the official, please submit the application here.
We consider adding local model inference in the future, such as qwen2.5-vl-7b-instruct, etc.
Additionally, now using streaming inference to output logs, you can see the model's real-time output before the complete output is displayed.
Now the Google gemini SDK has been updated, and the new version of the SDK is suitable for the new model of gemini 2.0!
The new SDK is more powerful and mainly supports the function of verifying uploaded videos.
If you want to repeatedly tag the same video and no longer need to upload it repeatedly, the video name and file size/hash will be automatically verified.
At the same time, the millisecond-level alignment function has been updated. After the subtitles of long video segmentation are merged, the timeline is automatically aligned to milliseconds, which is very neat!
- Automatic video/audio/image description using Google's Gemini API or only image with pixtral-large 124B
- Export captions in SRT format
- Support for multiple video formats
- Batch processing with progress tracking
- Maintains original directory structure
- Configurable through TOML files
- Lance database integration for efficient data management
- Import videos into Lance database format
- Preserve original directory structure
- Support for both single directory and paired directory structures
- Extract videos and captions from Lance datasets
- Maintains original file structure
- Exports captions as SRT files in the same directory as source videos
- Auto Clip with SRT timestamps
- Automatic video scene description using Gemini API or Pixtral API
- Batch processing support
- SRT format output with timestamps
- Robust error handling and retry mechanisms
- Progress tracking for batch operations
- API prompt configuration management
- Customizable batch processing parameters
- Default schema includes file paths and metadata
Give unrestricted script access to powershell so venv can work:
- Open an administrator powershell window
- Type Set-ExecutionPolicy Unrestricted and answer A
- Close admin powershell window
Run the following PowerShell script:
./1、install-uv-qinglong.ps1
- First install PowerShell:
./0、install pwsh.sh
- Then run the installation script using PowerShell:
sudo pwsh ./1、install-uv-qinglong.ps1
use sudo pwsh if you in Linux.
video example: https://files.catbox.moe/8fudnf.mp4
Use the PowerShell script to import your videos:
./lanceImport.ps1
Use the PowerShell script to export data from Lance format:
./lanceExport.ps1
Use the PowerShell script to generate captions for your videos:
./run.ps1
Note: You'll need to configure your Gemini API key in run.ps1
before using the auto-captioning feature.
Pixtral API key optional for image caption.
Now we support step-1.5v-mini optional for video captioner.
Now we support qwen-VL series optional for video captioner.
$dataset_path = "./datasets"
$gemini_api_key = ""
$gemini_model_path = "gemini-2.0-pro-exp-02-05"
$pixtral_api_key = ""
$pixtral_model_path = "pixtral-large-2411"
$step_api_key = ""
$step_model_path = "step-1.5v-mini"
$qwenVL_api_key = ""
$qwenVL_model_path = "qwen-vl-max-latest" # qwen2.5-vl-72b-instruct<10mins qwen-vl-max-latest <1min
$dir_name = $true
$mode = "long"
$not_clip_with_caption = $false # Not clip with caption | 不根据caption裁剪
$wait_time= 1
$max_retries = 100
$segment_time= 300
基于 Lance 数据库格式的视频自动字幕生成工具,使用 Gemini API 进行场景描述生成。
- 使用 Google Gemini API 进行视频场景自动描述
- 导出 SRT 格式字幕文件
- 支持多种视频格式
- 批量处理并显示进度
- 保持原始目录结构
- 通过 TOML 文件配置
- 集成 Lance 数据库实现高效数据管理
- 将视频导入 Lance 数据库格式
- 保持原始目录结构
- 支持单目录和配对目录结构
- 从 Lance 数据集中提取视频和字幕
- 保持原有文件结构
- 在源视频所在目录导出 SRT 格式字幕
- 使用 Gemini API 进行视频场景描述
- 支持批量处理
- 生成带时间戳的 SRT 格式字幕
- 健壮的错误处理和重试机制
- 批处理进度跟踪
- API 配置管理
- 可自定义批处理参数
- 默认结构包含文件路径和元数据
运行以下 PowerShell 脚本:
./1、install-uv-qinglong.ps1
- 首先安装 PowerShell:
./0、install pwsh.sh
- 然后使用 PowerShell 运行安装脚本:
pwsh ./1、install-uv-qinglong.ps1
使用 PowerShell 脚本导入视频:
./lanceImport.ps1
使用 PowerShell 脚本从 Lance 格式导出数据:
./lanceExport.ps1
使用 PowerShell 脚本为视频生成字幕:
./run.ps1
注意:使用自动字幕生成功能前,需要在 run.ps1
中配置 Gemini API 密钥。
Pixtral API 秘钥 可选为图片打标。
现在我们支持使用阶跃星辰的视频模型进行视频标注。
现在我们支持使用通义千问VL的视频模型进行视频标注。
$dataset_path = "./datasets"
$gemini_api_key = ""
$gemini_model_path = "gemini-2.0-pro-exp-02-05"
$pixtral_api_key = ""
$pixtral_model_path = "pixtral-large-2411"
$step_api_key = ""
$step_model_path = "step-1.5v-mini"
$qwenVL_api_key = ""
$qwenVL_model_path = "qwen-vl-max-latest" # qwen2.5-vl-72b-instruct<10mins qwen-vl-max-latest <1min
$dir_name = $true
$mode = "long"
$not_clip_with_caption = $false # Not clip with caption | 不根据caption裁剪
$wait_time= 1
$max_retries = 100
$segment_time= 300