Skip to content

Transcribe videos and create html pages with that

Notifications You must be signed in to change notification settings

profusion/pf-video-transcribe

Repository files navigation

ProFUSION Video Transcribe

Install

Install the project using Poetry:

$ poetry install --with dev
Installing dependencies from lock file
...
Installing the current project: pf-video-transcribe

This project uses Faster Whisper, a faster implementation of OpenAI's Whisper, which in turn is built on top of CTranslate2 hardware optimizations, that requires installation of NVidia CUDA libraries, see their installation instructions.

Run

Run the command line tool:

$ pf-video-transcribe --help

All commands take --log=LEVEL or --log=DOMAIN:LEVEL to change the log level of every package, such as pf_video_transcribe.transcribe, faster_whisper and so on. If no domain is given, then the provided level applies to all log domains. This is a global option and should be specified before the subcommand.

Subcommands are explained in the next sections.

Transcription

Given that the transcription is a heavy process and takes a lot to load the model and then to process each media file, it's implemented as a batch operation that generates an intermediate in the JSON Lines (".jsonl") format, with a "header" line followed by all the "segment", ended by a "finished" line with success or failure indicator. Each segment carries the useful information extracted by OpenAI Whisper:

$ pf-video-transcribe transcribe videos/my-video.mp4 videos/other-video.mp4

This will generate videos/my-video.jsonl and videos/other-video.jsonl.

Note that the first time it will take a lot to download the model from the internet. In the next iterations, the local model will be used, but first they will be checked remotely -- which can also take time. Using the --local flag will skip that check.

The language is auto-detected from the first 30 seconds of actual sound (silent is ignored), but if you do know the language, use the --language=LANG flag.

Audio Speech Recognition (ASR) models work on slices of the media, producing segments that are smaller than an actual human language sentence/phrase. The --merge-threshold=SECONDS will merge sibling segments if: next_segment.start - last_segment.end <= merge_threshold. The default is 1 second.

A more complex example:

$ pf-video-transcribe \
      --log=DEBUG \
      transcribe \
      --local \
      --language=pt \
      --merge-threshold=5 \
      videos/my-video.mp4 videos/other-video.mp4

With the transcribed ".jsonl" one can convert to more usable formats, see the next sections.

Convert to HTML

This generates the HTML meant to easy viewing of the result, a <video> linking to the transcribed media alongside a <track kind="subtitles"> linking to the subtitles, the thumbnail to be used by OpenGraph og:image and the actual transcription segments.

Note: both .vtt (subtitles) and .jpeg (thumbnail) are auto-generated if they don't exist or if they are older than the actual input .jsonl.

Convert to VTT

Web Video Text Track is a subtitle specified by the W3C and used by all web browsers whenever specified inside the <video> element.

The conversion takes parameter --duration-threshold=SECONDS to control the maximum duration of a single subtitle entry.

$ pf-video-transcribe vtt videos/*.jsonl

Convert to SRT

SRT or SubRip is a defacto standard subtitle format that most media players will take. The conversion takes parameter --duration-threshold=SECONDS to control the maximum duration of a single subtitle entry.

$ pf-video-transcribe srt videos/*.jsonl

Create Thumbnail

Uses FFmpeg to generate a thumbnail from the video or its transcription. The --size=WIDTHxHEIGHT allows to override the default 320x-1 (-1 is used to calculate that dimension from the other, keeping the aspect ratio).

$ pf-video-transcribe thumbnail videos/*.jsonl

Creating Index HTML

Recursively scans the given directories looking for .html files, which can be produced by this tool or not. The generated index will take the <title> and <meta property="og:image"> to gather the actual title or preview.

It's a very simple way to generate a landing page.

$ pf-video-transcribe index_html videos/

Serving (Development)

While developing this tool or playing with parameters it's useful to serve the files from http:// as the file:// will have some issues with video files (security limitations). By default serves at --port=8000.

$ pf-video-transcribe serve videos/

Development

Install the project with development dependencies:

$ poetry install --with dev
Installing dependencies from lock file
...
Installing the current project: pf-video-transcribe

Install pre-commit in your machine, then install the GIT Hooks:

$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
pre-commit installed at .git/hooks/pre-push
pre-commit installed at .git/hooks/pre-merge-commit

Used tools:

  • Code Formatter: Black
  • Static Type Checker: MyPy
  • Style Enforcement/Linter: Flake8

About

Transcribe videos and create html pages with that

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published