Saved here for my own reference, and possibly others’ if they should stumble across it: the easiest workflow I’ve found yet for converting DVDs or Blu-Rays for personal use on macOS, including conversion of subtitles from either Closed Captions, VobSub (DVD), or PGS (Blu-Ray) format to text-based .srt files suitable for use as soft subtitles, either as a sidecar file or included in the final movie file. (Updated from my original 2015 post to account for software and process changes).

Rip the System Disk

DVD Subtitle Workflow 1

Use MakeMKV to rip the DVD or BluRay disc to .mkv files.

Since I’m archiving special features as well as the main program, I simply rip every title on the disk longer than 30 seconds, and then trash any that I don’t need (such as menus, studio promos, etc.). I do check to make sure that all English-language audio or subtitle tracks are selected; usually they are by default, but I’ve seen rare situations where they need to be manually checked.

Once all the .mkv files have been created, I go through and rename each one to be something more descriptive than title_t03.mkv.

Extract the Subtitles

DVD Subtitle Workflow 2

For each .mkv file, use Subler to extract the subtitles. This takes two passes through Subler to complete.

  1. First, drag the .mkv file onto Subler, and deselect everything but the subtitle track(s) that you want to convert.

    Subler Import

    Subler’s “Info” column will describe the subtitles as either VobSub, PGS, or Text. I used to convert them all so that I could choose which gave me the best results; now, I’ll ignore VobSub/PGS if Text is available (but it’s less common).

    VobSub or PGS: These are the most common subtitle types. They’re actually a series of image files (.png, I think) with attached timing information that media players layer over the video stream. The advantage is that font, color, size, placement, and even fancier graphics (sometimes used for “pop up trivia” style tracks) are all at the creator’s discretion; the disadvantage is that because they’re image files, the text has to be extracted through an OCR (optical character recognition) process that frequently leads to typos and garbage characters.

    Text: These are Closed Caption files. I’m not sure how they’re stored on the physical disks, but current versions of MakeMKV convert them to text during the process of ripping to .mkv. I’ve generally found these to have far fewer typos and oddities than OCR’d VobSub or PGS subtitles. However, it’s often a toss-up as to whether the captions are presented using standard captalization or in ALL CAPITALS, and they use varying numbers of space characters to manually place text centered or off-centered. Depending on how picky you are about the output, these factors can affect how much post-processing is needed.

    After choosing the subtitle tracks and clicking “Add” to create a new Subler document, you can either save the Subler document (fine if you’re only doing a single file) or use File > Send to Queue to create a batch queue (best if you’re converting multiple files). When the file is saved or the queue is run and all queued files are saved, Subler will either extract the Closed Caption text or OCR the subtitle images and output a small .mp4 file.

  2. Second run; drag the new .mp4 file back onto Subler, click on the subtitle track(s), and choose File > Export… to save the .srt file(s). The tiny .mp4 file can then be deleted.

    Subler Export

Correct the Subtitles

DVD Subtitle Workflow 3

As noted above, the exported .srt file(s) are virtually guaranteed to have some oddities; how many and how intrusive they are depends on the source. Caption files are often in ALL CAPS and have weird spacing used to force the text to a desired on-screen position. Subtitle files will contain OCR errors, but BluRay (PGS) subs seem to come out better than DVD (VobSub) subs (likely due to the higher resolution of the format giving better quality text for the OCR process to scan). Accuracy is also affected by the chosen font and whether or not italics were used.

For correction, I use a couple methods.

  1. For a quick-and-dirty “good enough most of the time” run, I use BBEdit (but just about any other text editor would work) to do a quick spellcheck, identifying common errors and using search-and-replace to fix them in batches.

    I’ve actually set up a few scripts to automate the most common search-and-replace steps to help with this process.

  2. For a real quality fix–or if I have the time to create subtitles from scratch for a file that doesn’t have any–I use Subtitle Edit Pro to go through line-by-line, comparing the text to the original audio, adding italics when appropriate, and so on. (I used to recommend Aegisub, but that project appears to have been abandoned a few years back. There doesn’t seem to be a big market for subtitle editing on macOS; Subtitle Edit Pro is the best option I’ve found since Aegisub stopped working consistently.)

Of course, these two processes can be combined, done at different times, or skipped entirely; if I don’t have time or energy to do the error correction, I can always go back and use Subler to extract the .srt files for cleanup later.

Embed the Subtitles

DVD Subtitle Workflow 4

Use HandBrake to re-encode and convert the .mkv file (which at this point will be fairly large, straight off the source media) to a smaller .m4v file. Include the subtitle file by choosing Tracks > Add External Subtitles Track… in HandBrake’s Subtitles tab.

Handbrake Subtitles

Or, if you’re already working with an .m4v file, you can use Subler to add .srt files to into the .m4v: Drag the .m4v file from HandBrake on to Subler, drag the .srt file(s) into the window that opens, and then save the file.

Finito!

And that’s it. Now, you should have a .m4v file with embedded text-based soft subtitles.

TWOK Subtitles Example

You can also just store the .srt file(s) in the same directory and with the same name as the .m4v file for apps that don’t read embedded .srt files but will read sidecar files.

A gorgeous, fully restored Monty Python’s Flying Circus Norwegian Blu-ray Edition box set has just been released, and while I can definitely say that it looks great, others have reviewed it far more comprehensively than I’m able to do, and if you’re into the technical details, there’s some fascinating information about the restoration process in this article.

However, there is one small thing about the set that is a little unfortunate: Each episode only has three chapter stops.

Since I’m in the habit of ripping all of my DVDs and Blu-rays for storage and playback through my Plex media server, I decided to see if there was something I could do about that. Turns out there is! Here’s a rundown of the process, in case anyone else is curious (or if I need to remind myself how to do it for future projects).

  1. Rip the disc using MakeMKV to individual .mkv files for each episode (and while you’re doing so, you might want to pay attention to the subtitles as well.

  2. For each episode, open the .mkv file with the MKVToolNix GUI. Go to the “Chapter Editor” tab, and (at least in this case) remove the existing chapters.

  3. At the same time, open the .mkv file with a video player that allows for frame-by-frame scanning and that can display timecodes down to the millisecond (I use Aegisub).

  4. In MKVToolnix, use the “Add Chapter” button to create the first chapter; you’ll see it appear in the “Chapters:” list. Click on the chapter to enable editing. Set the start time to “00:00:00.000”. Optionally (but recommended), set the “Name” for the chapter: This could be as simple as “Chapter 1”, or a more descriptive chapter name (in this particular case, the highly detailed books of notes that came with the Monty Python set came in very handy for identifying the chapters and titles).

  5. Scan through the video file with your video player until you find the end point of the opening chapter/beginning point of the next chapter. Read the timecode from the video player, and use that to set the “End:” time in MKVToolnix (for example, “00:00:30.831” is zero hours, zero minutes, 30.831 seconds into the video).

  6. Click the “Add chapter” button to add the next chapter, and set its start time to the same timecode as the end time of the prior chapter.

  7. Continue on until all chapters have been defined.

  8. Once all chapters are defined, in MKVToolnix’s “Chapter Editor” window, choose “Save to Matroska file”. Select the .mkv file you’re working with, and click “Save”. Don’t worry if you get a warning that the file will be replaced, MKVToolnix will only replace the chapter markers, and will not wipe out the rest of the file.

Once that’s done, the .mkv file will have correct chapter markers set. If you then do any further encoding (such as converting from .mkv to .mp4 with Handbrake, which I do for my video storage to save space), those chapter markers will be preserve. This makes skipping around and finding particular points in the video (in this particular case, going directly to specific sketches within each episode) much easier.

It’s the one downside to an otherwise incredible set, and while this solution isn’t exactly simple or fast, neither is it terribly difficult or time consuming, and makes for a much better final experience.

Bonus: If others are ripping their Python box sets and would prefer not to go through the trouble of finding the chapter stops themselves, here’s a 73KB .zip file with .xml files for (nearly*) every episode’s chapter stops as I defined them. These files should be importable into MKVToolnix, replacing steps four through seven above (and saving you a lot of time).

* At present, I’m missing files for episodes 12 and 13 of Series 1, as I seem to have gotten a bad pressing of disc 2 of that set. I’ll add those two episodes and remove this qualifier once I’ve received a replacement disc.

NOTE: This post should be considered deprecated in favor of this update for 2021. I’m leaving this here, but the new post is the preferred version.


Saved here for my own reference, and possibly others’ if they should stumble across it: the easiest workflow I’ve found yet for converting DVDs or Blu-Rays (if you have a Blu-Ray reader, of course) for personal use on OS X, including OCR conversion of subtitles in either VOBSUB (DVD) or PGS (Blu-Ray) format to text-based .srt files suitable for use as soft subtitles, either as a sidecar file or included in the final movie file.

Movie Rip Workflow

The flow diagram to the right gives an overview of the process I’ve landed on. Here’s a slightly more detailed breakdown.

  1. Use MakeMKV to rip the DVD or BluRay disc to an .mkv file (if I run into a stubborn DVD, or one with a lot of multiplexing, I’ll use RipIt to create a disk image first, then run that image through MakeMKV). To save space, you can select only the primary audio track for inclusion, or you can select others if you want other languages or commentary tracks archived as well (though this will require more storage space). I also select all available English-language subtitle tracks, as some discs will include both standard subtitles and subtitles for the hearing impaired or closed captions, which include some extra information on who is speaking and background sounds, or occasionally even transcriptions of commentary tracks.
  2. Use Subler to OCR and export the subtitle files. This takes two runs through Subler to complete.
    1. First run; drag the .mkv file onto Subler, and only select the subtitle track(s). Pop that into the export queue, and after a few minutes of processing (this is when the OCR process happens) Subler will output a tiny .m4v file.
    2. Second run; drag that file back onto Subler, click on the subtitle track, and choose File > Export… to save the .srt file(s). The tiny .m4v file can then be deleted.

    Now, the OCR process is not perfect, and the resulting .srt file(s) are virtually guaranteed to have some errors. How many and how intrusive they are depends on the source. BluRay subs seem to come out better than DVD subs (likely due to the higher resolution of the format giving better quality text for the OCR process to scan), DVD subs are also affected by the chosen font and whether or not italics were used. For correction, I use one of two methods.

    1. For a quick-and-dirty “good enough for now” run, I use BBEdit (but just about any other text editor would work) to do a quick spellcheck, identifying common errors and using search-and-replace to fix them in batches.
    2. For a real quality fix, I use Aegisub to go through line-by-line, comparing the text to the original audio, adding italics when appropriate, and so on.

    Of course, these two processes can be combined, done at different times, or skipped entirely; right now, I’m just living with the OCR errors, because I can always go back and use Subler to extract the .srt files for cleanup later on when I have more time.

  3. Use HandBrake to re-encode and convert the .mkv file (which at this point will be fairly large, straight off the source media) to a smaller .m4v file. You can either embed the .srt files at this point, under HandBrake’s ‘Subtitles’ tab, or if you prefer…
  4. …you can use Subler to .srt files into into the .m4v: Drag the .m4v file from HandBrake on to Subler, drag the .srt file(s) into the window that opens, and then drop that into the queue for final remuxing (optionally, before adding the files to the queue, use Subler’s metadata search tools to add the description, artwork, and other metadata). Then run the queue to output the final file.

And that’s it. Now, you should have a .m4v file with embedded text-based soft subtitles for programs that support that (VLC, Plex, etc.), or you can just use the .srt file(s) created by Subler earlier as a sidecar file for programs that don’t read the embedded .srt.