Archive for Majo, 2010

Workflow for multi-lingual subtitles in DVD Studio Pro

Majo 9, 2010

Despite reading manuals, a good deal of googling, and even attending Apple’s One to One tutoring for a year, I still haven’t read of a good solution for easily creating multi-lingual subtitles in DVD Studio Pro.

So I made one myself, and that’s what I’m presenting here. Feel free to use it, and let me know how you get on with a comment below. (Note that it uses AWK, so it’ll work straightaway on Linux and Mac, but you’ll need to install AWK or GAWK on Windows — google for it). Please do suggest possible improvements where you see them.

What I want to do:

  1. create start and stop points for subtitles using keystrokes while a video is playing
  2. import subtitle text from a separate file where it has been created/translated externally
  3. be able to easily add a new language to an existing subtitled project

And this is how I go about it. (I’ll do another post about other things I tried and why they didn’t work out, but here’s what I’ve found to work after numerous dead ends).

(1) Use Jubler to create the timings, either by using the method described in the first part of this article, or by following “How can I create new subtitles while the movie is played?” in the Jubler documentation.

You don’t need to type the text of the subtitles if you’ve already got it in another file, although I’ve found it useful to put in one or two key words that you can later match up to the spoken dialogue. The important thing at this stage is to get the start and end point of each subtitle in the right place. Save the result as a .stl file. For the purpose of these instructions, I’ll assume it’s called blank.stl — note that it MUST be in STL format for the AWK script below to work.

(2) Copy and paste the following AWK script into a new text file, and save it with the filename translateSTL.

#!/usr/bin/awk -f
# by Tim Morley, 2010
# distributed under Creative Commons Attribution-Share Alike license
# http://creativecommons.org/licenses/by-sa/2.0/uk/
#
# =====================================================================
#
# Syntax:
# ~$ translateSTL lang=XX start=ROW translations.csv file.stl
#
# where:
# - XX is a language code (e.g. EN, HU, DE, etc.) which must match
# a code in the top row of one column in translations.csv
# - ROW is the line number in translations.csv of the first subtitle
# which is to be imported
# - translations.csv is a TAB-separated (NOT comma-separated) file
# containing set(s) of translated subtitles (one subtitle per cell,
# one language per column). This file should contain at least as
# many subtitles as are contained in file.stl
# - file.stl is a subtitle file in STL format
#
# The function will send a copy of file.stl to STDOUT which retains the
# timecodes from file.stl but next to the subtitles from the relevant
# column of translations.csv
#
# =====================================================================

BEGIN { FS = "\t" } # separator in translations.csv is tab (not comma, to avoid problems with commas in subtitle text)

# var lang = language code set by user parameter
# var start = number of row where first translation to be used is found in translations.csv

((NR == FNR) && (NR == 1)) { # for first file ("translations.csv"), for first line
for (i=1; i<=NF; i++) {
if ($i == lang) langColumn = i; # find the column that matches the given language code
}
rowIndex = start; # set this once; can't put it in BEGIN {...}, because $start not yet instantiated there
}

((NR == FNR) && (NR >= start)) {
translation[NR] = $langColumn; # build array of translations using row numbers as indices
}

(NR != FNR) { # for second file ("file.stl")
FS = "[ \t]*,[ \t]*"; # setting FS here means first line of file.stl gets mangled if it starts with a timecode
if (FNR==1) printf "\xef\xbb\xbf" # add byte order mark, because DVD Studio Pro inexplicably needs it... grumble... you might be better to remove this line if you're not using DVD-SP
if ($1 ~ /^..:..:..:../) { # for each line that starts with a timecode...
printf "%s,%s, %s\n", $1, $2, translation[rowIndex]; # output its two timecodes followed by a translated subtitle
rowIndex++;
} else {
print $0; # otherwise just output the line as is
}
}


(3) Make the file you’ve just saved executable by opening a terminal window and typing:

chmod +x translateSTL

(4) Your translations need to be in a spreadsheet — Google Docs is a good option if you’re colaborating with other translators (although see comment below for one small problem, and how to get round it).

Sample spreadsheet showing translations in three languages

Each column should contain your subtitles in a different language, with one subtitle per spreadsheet cell. The top cell of each column (in Row 1) should contain the language code of that column (e.g. EO for Esperanto, EN for English, HU for Hungarian, etc.) And the file needs to be saved as a .csv file USING TABS, NOT COMMAS, TO SEPARATE THE FIELDS, and with NO QUOTATION MARKS AROUND THE TEXT, unless you want them to appear in your subtitles. The script above will not work if you do not use tabs as field separators.

For the purposes of these instructions, I’ll suppose that this file is called translations.csv

(5) To make things easy, put all the relevant files into the same folder: translateSTL, blank.stl and translations.csv. Now, if you want to merge the text from the English column of your translations.csv spreadsheet, starting from Row 2, into blank.stl to form a new file called EN.stl then back in the terminal window, cd to the folder where all these files are, and type the following incantation:

./translateSTL lang=EN start=2 translations.csv blank.stl > EN.stl

[The above should be all on one line; don’t press return until you’ve typed it all. Alternatively, you can copy and paste what’s above into your terminal, and then press return.]

If all has gone well, you should now have a new file called EN.stl containing the timestamps you created in Jubler together with the text from the column headed “EN” from your spreadsheet. (See comment below for potential problem with the first line of your new file, and how to fix it).

To create the German version, assuming you have a column headed DE in your spreadsheet, type this:

./translateSTL lang=DE start=2 translations.csv blank.stl > DE.stl

If you’re working on just one scene from the middle of a longer project, you can start to merge the text from a particular row of translations.csv. If your current scene starts at row 190 of translations.csv, type the following to create the French subtitles (from the column with FR in its first row) for this scene:

./translateSTL lang=FR start=190 translations.csv blank.stl > FR.stl

Hope that helps!