DeovidLang: Automate Video Creation from Audio with AI Subtitles

Ever wished you could turn a simple audio recording into a polished video—with subtitles in multiple languages—all automatically? Whether you’re a content creator, educator, or localizer, you probably know how time-consuming it is to manually transcribe, translate, and sync subtitles.

What if there was a tool that could handle all of this for you in one go?

DeovidLang does exactly that. It takes your audio files, transcribes them using OpenAI’s Whisper, translates the text, generates bilingual subtitles, adds image overlays at specific timestamps, and produces a ready-to-share MP4 video.

ℹ️ Info

DeovidLang automates the entire video creation pipeline: transcription → translation → subtitle generation → image overlay → final video output. All from a single Python script!

What You’ll Learn

How DeovidLang works under the hood
How to set up and run the tool
How to use different Whisper models
How to customize language and output settings
Real-world use cases for content creators

Key Features

📖 🎙️ Audio Transcription

Powered by OpenAI’s Whisper model, DeovidLang accurately transcribes audio in multiple languages. Choose from tiny (fastest) to large (most accurate) models.

🌍 Multi-Language Translation

Translate transcribed text into your desired language. Perfect for reaching international audiences with localized content.

📝 Bilingual Subtitles

Generate SRT subtitle files showing both original and translated text—ideal for language learners and multilingual audiences.

📖 🖼️ Image Overlay

Add images at specific timestamps during video playback. Great for adding slides, diagrams, or branding to your videos.

🎬 Video Generation

FFmpeg combines audio, subtitles, and image overlays into a single MP4 file. Configurable resolution and format options.

📁 Automated Organization

Output files are automatically organized in date-based directories, keeping your project structured and easy to navigate.

How It Works

DeovidLang follows a straightforward pipeline:

Audio File → Whisper Transcription → Translation → SRT Subtitles → Image Overlay → Final MP4 Video

Step-by-Step Process

Input: You provide an audio file (M4A, MP3, WAV, etc.)
Transcription: Whisper converts speech to text
Translation: Translated text is generated in your target language
Subtitle Generation: Both original and translated text become SRT files
Image Overlay: Specified images appear at timestamps you define
Video Assembly: FFmpeg combines everything into the final MP4

💡 Tip

Image overlays happen at specific timestamps—perfect for matching slides to narration in presentations or tutorials.

Installation

Step 1: Install System Dependencies

FFmpeg is required for video processing:

Press on a tab to see code

sudo apt install ffmpeg

brew install ffmpeg

# Download from https://ffmpeg.org/download.html
# Or use winget:
winget install ffmpeg

Step 2: Install Python Dependencies

pip install pydub pillow

Step 3: Install Whisper

pip install git+https://github.com/openai/whisper.git

Step 4: Clone the Repository

git clone https://gitlab.com/krafi/deovidlang.git
cd deovidlang

Usage

Basic Command

python deovidlang.py <directory> <audio_file> --model <model> --language <language> --task <task>

Example

python deovidlang.py myproject myproject/speech.m4a --model medium --language en --task translate

Parameters

Parameter	Options	Default	Description
`--model`	tiny, base, small, medium, large	tiny	Whisper model size
`--language`	Language code (en, ru, es, etc.)	ru	Source audio language
`--task`	transcribe, translate, both	both	Operation to perform

📖 Whisper Model Comparison

Model	Size	Speed	Accuracy
tiny	39 MB	Fastest	Lower
base	74 MB	Fast	Good
small	244 MB	Medium	Better
medium	1.5 GB	Slow	High
large	2.9 GB	Slowest	Highest

💡 Tip

Start with “tiny” for quick testing. Use “medium” or “large” for production-quality transcriptions.

Real-World Use Cases

1. Educational Content Localization

Create bilingual videos for language learners. Transcribe your English lecture, translate to Spanish, and generate subtitles showing both languages simultaneously.

2. Podcast to Video

Turn podcast episodes into YouTube videos. Add cover art or branding images at the beginning, and let DeovidLang handle the rest.

3. Tutorial Video Creation

Record your voiceover, then automatically generate subtitles and create a professional video with slides matching your narration.

4. Accessibility

Make your content accessible to deaf or hard-of-hearing viewers with accurate auto-generated subtitles in multiple languages.

Example Output

After running DeovidLang, you’ll get:

output/
├── 2024-01-15/
│   ├── original.srt      # Original language subtitles
│   ├── translated.srt    # Translated subtitles
│   └── final.mp4         # Final video with everything

The final MP4 includes:

Your original audio
Image overlays at specified timestamps
Burned-in bilingual subtitles

Troubleshooting

📖 FFmpeg not found?

Verify FFmpeg is installed:
```
ffmpeg -version
```
Add FFmpeg to your system PATH
Restart your terminal

📖 Whisper model download slow?

First run downloads the model (39MB - 2.9GB depending on size). Use a stable internet connection. The model is cached for subsequent runs.

📖 Subtitles not showing in video?

Check that SRT files were generated
Verify image timestamps don’t overlap
Try a different output format

📖 Audio quality issues?

Ensure your input audio is clear. Whisper works best with:

Minimal background noise
Clear speech
Sample rate of 16kHz or higher

Why This Project is Useful

DeovidLang solves several pain points for content creators:

Saves Hours: Manual transcription takes 4-6x the audio length. DeovidLang does it in minutes.
No Expensive Tools: No need for Adobe Premiere, Final Cut, or subscription services.
Multilingual Ready: Reach global audiences with translated subtitles.
Automated Workflow: One command handles the entire pipeline.

💡 Tip

Combine DeovidLang with my other project WhisperWeb (covered in a previous blog) for a complete audio-to-video workflow!

Conclusion

DeovidLang is a powerful yet simple tool that automates video creation from audio. Whether you’re a educator, podcaster, or content creator, it handles the heavy lifting—transcription, translation, subtitles, and video assembly—so you can focus on creating content.

Give it a try and transform your audio files into shareable videos in minutes!

Source Code

View and contribute to the project: DeovidLang on GitLab

Happy automating!

DeovidLang: Automate Video Creation from Audio with AI Subtitles

What You’ll Learn

Key Features

How It Works

Step-by-Step Process

Installation

Step 1: Install System Dependencies

Step 2: Install Python Dependencies

Step 3: Install Whisper

Step 4: Clone the Repository

Usage

Basic Command

Example

Parameters

Real-World Use Cases

1. Educational Content Localization

2. Podcast to Video

3. Tutorial Video Creation

4. Accessibility

Example Output

Troubleshooting

Why This Project is Useful

Conclusion

Source Code

On this page

Share Article

Knowledge Check

Answer Review

Discussion

DeovidLang: Automate Video Creation from Audio with AI Subtitles

What You’ll Learn

Key Features

How It Works

Step-by-Step Process

Installation

Step 1: Install System Dependencies

Step 2: Install Python Dependencies

Step 3: Install Whisper

Step 4: Clone the Repository

Usage

Basic Command

Example

Parameters

Real-World Use Cases

1. Educational Content Localization

2. Podcast to Video

3. Tutorial Video Creation

4. Accessibility

Example Output

Troubleshooting

Why This Project is Useful

Conclusion

Source Code

On this page

Share Article

Knowledge Check

📋 Answer Review

Join the Discussion

Welcome aboard!

Welcome aboard!

Answer Review