Workflow for multi-lingual subtitles in DVD Studio Pro

Majo 9, 2010 de timsk

Despite reading manuals, a good deal of googling, and even attending Apple’s One to One tutoring for a year, I still haven’t read of a good solution for easily creating multi-lingual subtitles in DVD Studio Pro.

So I made one myself, and that’s what I’m presenting here. Feel free to use it, and let me know how you get on with a comment below. (Note that it uses AWK, so it’ll work straightaway on Linux and Mac, but you’ll need to install AWK or GAWK on Windows — google for it). Please do suggest possible improvements where you see them.

What I want:

  1. to create start and stop points for subtitles using keystrokes while a video is playing
  2. to import subtitle text from a separate file where it has been created/translated externally
  3. to be able to easily add a new language to an existing subtitled project

And this is how I go about it. (I’ll do another post about other things I tried and why they didn’t work out, but here’s what I’ve found to work after numerous dead ends).

(1) Use Jubler to create the timings, either by using the method described in the first part of this article, or by following “How can I create new subtitles while the movie is played?” in the Jubler documentation.

You don’t need to type the text of the subtitles if you’ve already got it in another file; just create a stream of blank subtitles, with the start and end point of each subtitle in the right place. Save it. For the purpose of these instructions, I’ll assume it’s called blank.stl — note that it MUST be in STL format for the AWK script below to work.

(2) Copy and paste the following AWK script into a new text file, and save it with the filename translateSTL.

#!/usr/bin/awk -f
# by Tim Morley, 2010
# distributed under Creative Commons Attribution-Share Alike license
# http://creativecommons.org/licenses/by-sa/2.0/uk/
#
# =====================================================================
#
# Syntax:
#    ~$ translateSTL lang=XX start=ROW translations.csv file.stl
#
# where:
#   – XX is a language code (e.g. EN, HU, DE, etc.) which must match
#     a code in the top row of one column in translations.csv
#   – ROW is the line number in translations.csv of the first subtitle
#     which is to be imported
#   – translations.csv is a TAB-separated (NOT comma-separated) file
#     containing set(s) of translated subtitles (one subtitle per cell,
#     one language per column). This file should contain at least as
#     many subtitles as are contained in file.stl
#   – file.stl is a subtitle file in STL format
#
# The function will send a copy of file.stl to STDOUT which retains the
# timecodes from file.stl but next to the subtitles from the relevant
# column of translations.csv
#
# =====================================================================

BEGIN { FS = “\t” }                    # separator in translations.csv is tab (not comma, to avoid problems with commas in subtitle text)

# var lang = language code set by user parameter
# var start = number of row where first translation to be used is found in translations.csv

((NR == FNR) && (NR == 1)) {           # for first file (“translations.csv”), for first line
for (i=1; i<=NF; i++) {
if ($i == lang) langColumn = i;    # find the column that matches the given language code
}
rowIndex = start;                    # set this once; can’t put it in BEGIN {…}, because $start not yet instantiated there
}

((NR == FNR) && (NR >= start)) {
translation[NR] = $langColumn;       # build array of translations using row numbers as indices
}

(NR != FNR) {                          # for second file (“file.stl”)
FS = “[ \t]*,[ \t]*”;                # setting FS here means first line of file.stl gets mangled if it starts with a timecode
if ($1 ~ /^..:..:..:../) {           # for each line that starts with a timecode…
printf “%s,%s, %s\n”, $1, $2, translation[rowIndex]; # output its two timecodes followed by a translated subtitle
rowIndex++;
} else {
print $0;                          # otherwise just output the line as is
}
}

(3) Make the file you’ve just saved executable by opening a terminal window and typing:

chmod +x translateSTL

(4) Your translations need to be in a spreadsheet — Google Docs is a good option if you’re colaborating with other translators (although see comment below for one small problem, and how to get round it).

Sample spreadsheet showing translations in three languages

Each column should contain your subtitles in a different language, with one subtitle per spreadsheet cell. The top cell of each column (in Row 1) should contain the language code of that column (e.g. EO for Esperanto, EN for English, HU for Hungarian, etc.) And the file needs to be saved as a .csv file USING TABS, NOT COMMAS, TO SEPARATE THE FIELDS, and with NO QUOTATION MARKS AROUND THE TEXT, unless you want them to appear in your subtitles. The script above will not work if you do not use tabs as field separators.

For the purposes of these instructions, I’ll suppose that this file is called translations.csv

(5) To make things easy, put all the relevant files into the same folder: translateSTL, blank.stl and translations.csv. Now, if you want to merge the text from the English column of your translations.csv spreadsheet, starting from Row 2, into blank.stl to form a new file called EN.stl then back in the terminal window, cd to the folder where all these files are, and type the following incantation:

./translateSTL lang=EN start=2 translations.csv blank.stl > EN.stl

[The above should be all on one line; don't press return until you've typed it all. Alternatively, you can copy and paste what's above into your terminal, and then press return.]

If all has gone well, you should now have a new file called EN.stl containing the timestamps you created in Jubler together with the text from the column headed “EN” from your spreadsheet. (See comment below for potential problem with the first line of your new file, and how to fix it).

To create the German version, assuming you have a column headed DE in your spreadsheet, type this:

./translateSTL lang=DE start=2 translations.csv blank.stl > DE.stl

If you’re working on just one scene from the middle of a longer project, you can start to merge the text from a particular row of translations.csv. If your current scene starts at row 190 of translations.csv, type the following to create the French subtitles (from the column with FR in its first row) for this scene:

./translateSTL lang=FR start=190 translations.csv blank.stl > FR.stl

Hope that helps!

Very simple message below

Januaro 14, 2010 de timsk

Please click it and do it:

Support Doctors Without Borders in Haiti

Esperanto feature on The One Show

Januaro 11, 2010 de timsk

The One Show did a feature about Esperanto on tonight’s edition, which included a short section with me, talking to the legendary Arthur Smith.

It was great fun filming it — I’ve always really enjoyed Arthur Smith — and the production staff were not only interested in what we were talking about, but pleasingly open to suggestions about what we should say and how we might say it. I’m really pleased with the result too.

Enjoy!

Programero pri Esperanto ĉe BBC1

Januaro 11, 2010 de timsk

Ĉi-vespere, aperis ĉe la BBC programero pri Esperanto kadre de la programo The One Show, kiu elsendiĝas tutlande en Britio ĉiutage (de lundo ĝis vendredo) per BBC1, la ĉefa televid-kanalo de la BBC. La programo kutime estas spektata de ĉirkaŭ 5 milionoj de homoj*.

Mi tre ĝuis la filmadon, kaj delonge ŝategas la komediiston Arthur Smith, kiu prezentis la programeron kaj intervjuis min. Ankaŭ la teknikistoj kaj la produktisto estis interesitaj kaj tre simpatiaj pri la afero, kaj (tre kontentige) pretis akcepti proponojn miajn pri kion ni diru kaj kiel ni vortumu aferojn, por ne misreprezenti Esperanton.

Ĝuu!

*Tiun ciferon mi prenis de la retejo de la Broadcasters’ Audience Research Board, kiu indikas The One Show atingis spektantaron de 5.07 milionoj, 4.75 milionoj, kaj 5.32 milionoj, en ĝiaj lastaj tri semajnoj de elsendoj en 2009.

Cambridge Water: competent, and lovely too

Januaro 6, 2010 de timsk

Came back from holiday this week to find we’d got no water. Phoned Cambridge Water to see if they knew anything about it, and even at 11pm, they still had competent engineers on the phone, who spent 20 minutes talking me through different stuff to test and to try.

Their engineer came out the next morning, and it turned out our main inlet pipe was frozen. It managed to freeze because it’s outside the house (in the cupboard next to the front door), not inside the house like the building regulations say it’s supposed to be. Grr. Anyway, 10 minutes with a hairdryer put it to rights.

Top marks to Cambridge water though, both for late-night telephone help, and friendly, competent (and free) help in getting up and running again in the morning, even though it wasn’t actually their fault at all (well, er, apart from allowing Taylor Woodrow’s plumbing monkeys to go ahead and break building regulations in the first place — they should have and could have refused to allow connection to the water main until the plumbing was up to scratch).

Still very pleased with their customer service though.  :o )

Debenhams: robbing bastards

Januaro 6, 2010 de timsk

My partner was persuaded to take a Debenham’s store card last month, in return for a discount on a purchase. The application took so long at the checkout that she twice said, “Look, people are getting impatient here — forget it, I’ll just pay,” but the assistant pressed ahead, and the card was duly issued.

The first payment was due over the Christmas period. It arrived with Debenhams 2 days late. They applied a £12 fine to the account.

They’re a bunch of robbing bastards who wouldn’t know customer service if it bit them on the bottom.

In related news, the lovely John Lewis’s have just reported record-breaking Christmas sales. I’m not one to shout “correlation implies causation” too loudly, but I do know where I’m going to do my houseware shopping in the future.

Epson “empty cartridge” scam

Januaro 6, 2010 de timsk

I own an Epson Aculaser C2600 printer. It works reasonably well, although the steadily increasing price of the toner cartridges is a bit vexing. Not enough to qualify for the “scam” tag though.

The scam is that the printer announces that the toner cartridges are empty long, long, long before this actually becomes the case. And in its default state, the printer simply refuses to print another page until the cartridge is replaced.

As if that weren’t bad enough, I’ve phoned Epson technical support on two separate occasions to ask if there’s a way to override this setting, even just to finish the last few pages of a print run, and the answer on both occasions has been “no, sorry, there isn’t”.

However, when the printer downed tools mid-December claiming to be out of cyan toner, I spent a while wading through the labyrinthine menus on the printer’s tiny screen and I discovered one that says “Cartridge empty: stop OR continue”. So I changed it to ‘continue’ to see what would happen. And here’s the bit that merits the “scam” label.

I’ve since over 600 double-sided letters, each with a blue logo on front and back, and it’s still going strong. That’s 1200 blue logos from an empty blue cartridge.

How does this compare to other printers and manufacturers? Leave us a comment!

Blue. Lots of it.

Blue. Lots of it.

Multi-lingual auto-correction on the iPhone

Junio 6, 2009 de timsk

I’ve had an iPhone for a few months, and while it doesn’t quite do everything that I want — no Flash, no video recording, no laptop tethering, no BlueTooth file transfer — it does do a staggering array of wonderful things that I do like really rather a lot.  :-)

One thing that had been annoying me though is that I couldn’t find how to make it allow me to type in a foreign language without constantly proposing English corrections, e.g. when I type je or ne in French, it always wanted to “correct” them to me. Worse, it then remembered the foreign words, and ceased suggesting corrections that I would have wanted when typing in English.

However, it turns out that if you use a foreign on-screen keyboard (enable one or more of them via Settings –> General –> International –> Keyboards), then the iPhone auto-corrects to the corresponding language. Switching between keyboards is a one-tap operation on the keyboard itself, and the space bar always shows you what the current keyboard is.

That certainly solves the problem if Apple provide a keyboard for your languages of choice, but what if they don’t? I regularly email/Tweet/etc. in Esperanto, which isn’t in Apple’s list. My solution, for now, is to enable a. n. other language — I’ve chosen Czech — and use that keyboard when I’m writing in Esperanto. It’s a bit painful to start with, as it tries to “correct” just about every word, but it quickly learns the most common words and starts making useful suggestions, and it stops my English and French dictionaries getting poluted with irrelevant foreign words.

So there you go. Hope that helps. Feel free to leave better solutions in comments.

Wolfram|Alpha demonstrates long established GIGO principle

Majo 21, 2009 de timsk

The people at Wolfram|Alpha certainly seem to know how to kick up a hype storm, so I mosied over to see what it was all about.

To start with, I had trouble getting any response at all other than Wolfram|Alpha isn’t sure what to do with your input, but finally I got somewhere when I entered “Esperanto”.

Picture 2

So, Wolfram|Alpha gives me concrete information rather than just pointers to pages for me to sift through, and it’s beautifully presented too. Trouble is, it’s complete bollocks.

In July, I’ll be attending this year’s World Congress of Esperanto, to which 1700 people have currently signed up. At the last one I attended, in 2005, I was joined by 2300 Esperanto speakers from 62 countries (out of a total of 2000 in the world, don’t forget); this is actually reckoned to be, in round figures, about 1% of the number of fluent speakers. There will be several dozen native speakers there too; I’ll be having a few beers with two of them the week before. And this year, 150 years since the birth of a certain Ludwik Łazarz Zamenhof, the Congress is to be held in Białystok, the hometown of both Dr Zamenhof and the international language that he initiated. Białystok is not in France.

I’m sure there’s some very clever computational wizardry going on at Wolfram|Alpha, but the traditional garbage in, garbage out rule clearly still applies.

New version of “La Espero”

Januaro 7, 2009 de timsk

A new CD by jazz artists Garry Dial and Terry Roche has just been released, containing jazz interpretations of numerous national anthems.

Us An' Them

I mention it in passing because they’ve chosen to include La Espero, the Esperanto anthem. (Can’t call it a national anthem, because obviously the whole point is that it isn’t of any particular nation).

You can listen to clips and buy the album at CDbaby.com.

If you don’t already know it, you can hear a more traditional version of La Espero over at imeem.com.