Automatic video dubbing and translation from lb to other languages

Description

This project was made in 2 days during the '{lang: lb}' hackathon in 2023.

The goal of our proof of concept is to be able to do automated machine dubbing from video in Luxembourgish. The goal is to fasten the work of translators by providing a working base they can edit, but also fill the gap of lack of Luxembourgish support in automated video dubbing system.

This python programs let you:

Either download a video and extract audio track or use an already existing audio track
Convert the track to mono 16khz mp3 track
Run this track through schreifmaschinn's API (using Meta XLSR model) to get text extracted from speech
Group all words per ±3s (± 11 words considering human average speech speed is 220 wpm) in a CSV format transcript.csv
Send this, alongside a custom prompt, to chatGPT API in order to get it
1. Translated in another language
2. Rewrite to get a decent quality
3. Split the translation back to the original timecode
4. Change the timecode into dubbing timecode format
5. Generate text following SRT template
Save the result, per language, in several transcript-XX.srt

You'll find a diagram of how it works, and you'll see in blue the already identified possible optimisation:

You can also find a reference video at https://www.youtube.com/watch?v=4dEOXDq5lgU and see also the result of our automated translation embeded as Youtube subtitles.

Hackathon {lang:lb} laachen team

Topic

Science and Technology

Type

Application

Creation date

June 9, 2023

Last update

July 21, 2025

1 used dataset

Dataset for finetuning a Luxembourgish Automatic Speech Recognition

From Zenter fir d'Lëtzebuerger Sprooch

Den Dataset mat deem den XLSR Sproochemodell fir Lëtzebuergesch Sproocherkennung fine-getuned ginn ass, ënner MIT Lizenz.

Metadata quality:

44.0%

Updated on January 21, 2026

Other (Open)
- 1 reuse
- 0 favorites

Discussions

There are no discussions for this reuse yet.

Reuses from the same creator

There are no other reuses from this creator.

The luxembourgish open data platform

Automatic video dubbing and translation from lb to other languages

Description

Topic

Type

Tags

Creation date

Last update

1 used dataset

Dataset for finetuning a Luxembourgish Automatic Speech Recognition

Metadata quality:

Discussions

Reuses from the same creator

Subscribe to our newsletter