DeovidLang: Automate Video Creation from Audio with AI Subtitles

Learn how to automate video creation with DeovidLang. Transcribe audio, translate to multiple languages, generate bilingual subtitles, and create professional MP4 videos with image overlays—all in one Python script.

Rafi
Written by Rafi
📅
Published December 1, 2024
⏱️
Read Time 2 min
📊
Difficulty Intermediate

Ever wished you could turn a simple audio recording into a polished video—with subtitles in multiple languages—all automatically? Whether you’re a content creator, educator, or localizer, you probably know how time-consuming it is to manually transcribe, translate, and sync subtitles.

What if there was a tool that could handle all of this for you in one go?

DeovidLang does exactly that. It takes your audio files, transcribes them using OpenAI’s Whisper, translates the text, generates bilingual subtitles, adds image overlays at specific timestamps, and produces a ready-to-share MP4 video.

ℹ️ Info

DeovidLang automates the entire video creation pipeline: transcription → translation → subtitle generation → image overlay → final video output. All from a single Python script!

What You’ll Learn

  • How DeovidLang works under the hood
  • How to set up and run the tool
  • How to use different Whisper models
  • How to customize language and output settings
  • Real-world use cases for content creators

Key Features

📖 🎙️ Audio Transcription

Powered by OpenAI’s Whisper model, DeovidLang accurately transcribes audio in multiple languages. Choose from tiny (fastest) to large (most accurate) models.

🌍 Multi-Language Translation

Translate transcribed text into your desired language. Perfect for reaching international audiences with localized content.

📝 Bilingual Subtitles

Generate SRT subtitle files showing both original and translated text—ideal for language learners and multilingual audiences.

📖 🖼️ Image Overlay

Add images at specific timestamps during video playback. Great for adding slides, diagrams, or branding to your videos.

🎬 Video Generation

FFmpeg combines audio, subtitles, and image overlays into a single MP4 file. Configurable resolution and format options.

📁 Automated Organization

Output files are automatically organized in date-based directories, keeping your project structured and easy to navigate.

How It Works

DeovidLang follows a straightforward pipeline:

Audio File → Whisper Transcription → Translation → SRT Subtitles → Image Overlay → Final MP4 Video

Step-by-Step Process

  1. Input: You provide an audio file (M4A, MP3, WAV, etc.)
  2. Transcription: Whisper converts speech to text
  3. Translation: Translated text is generated in your target language
  4. Subtitle Generation: Both original and translated text become SRT files
  5. Image Overlay: Specified images appear at timestamps you define
  6. Video Assembly: FFmpeg combines everything into the final MP4
💡 Tip

Image overlays happen at specific timestamps—perfect for matching slides to narration in presentations or tutorials.

Installation

Step 1: Install System Dependencies

FFmpeg is required for video processing:

Press on a tab to see code
sudo apt install ffmpeg
brew install ffmpeg
# Download from https://ffmpeg.org/download.html
# Or use winget:
winget install ffmpeg

Step 2: Install Python Dependencies

pip install pydub pillow

Step 3: Install Whisper

pip install git+https://github.com/openai/whisper.git

Step 4: Clone the Repository

git clone https://gitlab.com/krafi/deovidlang.git
cd deovidlang

Usage

Basic Command

python deovidlang.py <directory> <audio_file> --model <model> --language <language> --task <task>

Example

python deovidlang.py myproject myproject/speech.m4a --model medium --language en --task translate

Parameters

ParameterOptionsDefaultDescription
--modeltiny, base, small, medium, largetinyWhisper model size
--languageLanguage code (en, ru, es, etc.)ruSource audio language
--tasktranscribe, translate, bothbothOperation to perform
📖 Whisper Model Comparison
ModelSizeSpeedAccuracy
tiny39 MBFastestLower
base74 MBFastGood
small244 MBMediumBetter
medium1.5 GBSlowHigh
large2.9 GBSlowestHighest
💡 Tip

Start with “tiny” for quick testing. Use “medium” or “large” for production-quality transcriptions.

Real-World Use Cases

1. Educational Content Localization

Create bilingual videos for language learners. Transcribe your English lecture, translate to Spanish, and generate subtitles showing both languages simultaneously.

2. Podcast to Video

Turn podcast episodes into YouTube videos. Add cover art or branding images at the beginning, and let DeovidLang handle the rest.

3. Tutorial Video Creation

Record your voiceover, then automatically generate subtitles and create a professional video with slides matching your narration.

4. Accessibility

Make your content accessible to deaf or hard-of-hearing viewers with accurate auto-generated subtitles in multiple languages.

Example Output

After running DeovidLang, you’ll get:

output/
├── 2024-01-15/
│   ├── original.srt      # Original language subtitles
│   ├── translated.srt    # Translated subtitles
│   └── final.mp4         # Final video with everything

The final MP4 includes:

  • Your original audio
  • Image overlays at specified timestamps
  • Burned-in bilingual subtitles

Troubleshooting

📖 FFmpeg not found?
  1. Verify FFmpeg is installed:

    ffmpeg -version
    
  2. Add FFmpeg to your system PATH

  3. Restart your terminal

📖 Whisper model download slow?

First run downloads the model (39MB - 2.9GB depending on size). Use a stable internet connection. The model is cached for subsequent runs.

📖 Subtitles not showing in video?
  1. Check that SRT files were generated
  2. Verify image timestamps don’t overlap
  3. Try a different output format
📖 Audio quality issues?

Ensure your input audio is clear. Whisper works best with:

  • Minimal background noise
  • Clear speech
  • Sample rate of 16kHz or higher

Why This Project is Useful

DeovidLang solves several pain points for content creators:

  • Saves Hours: Manual transcription takes 4-6x the audio length. DeovidLang does it in minutes.
  • No Expensive Tools: No need for Adobe Premiere, Final Cut, or subscription services.
  • Multilingual Ready: Reach global audiences with translated subtitles.
  • Automated Workflow: One command handles the entire pipeline.
💡 Tip

Combine DeovidLang with my other project WhisperWeb (covered in a previous blog) for a complete audio-to-video workflow!

Conclusion

DeovidLang is a powerful yet simple tool that automates video creation from audio. Whether you’re a educator, podcaster, or content creator, it handles the heavy lifting—transcription, translation, subtitles, and video assembly—so you can focus on creating content.

Give it a try and transform your audio files into shareable videos in minutes!

Source Code

View and contribute to the project: DeovidLang on GitLab

Happy automating!

Knowledge Check

Test your knowledge about DeovidLang video automation

Discussion

0 comments
Reading Progress
4 min left 0%
Welcome back! Sign in to join the discussion.

Please verify your email to sign in.

Enter the 6-digit code from your verification email.

Didn't receive the email?

Remember your password?

Create an account to comment and join the community.
Letters, numbers, and underscores only

Check your email! We've sent a verification code.

Enter the 6-digit code to complete your registration, or click the link in your email.

Didn't receive the email?

Wrong email?

Enter your email address and we'll send you a code to reset your password.

Remember your password?

Enter the 6-digit code from your email and create a new password.

Didn't receive code?

Welcome aboard!

Your account has been created successfully.

Welcome back! Sign in to join the discussion.

Please verify your email to sign in.

Enter the 6-digit code from your verification email.

Didn't receive the email?

Remember your password?

Create an account to comment and join the community.
Letters, numbers, and underscores only

Check your email! We've sent a verification code.

Enter the 6-digit code to complete your registration, or click the link in your email.

Didn't receive the email?

Wrong email?

Enter your email address and we'll send you a code to reset your password.

Remember your password?

Enter the 6-digit code from your email and create a new password.

Didn't receive code?

Welcome aboard!

Your account has been created successfully.