Read your DVDs the RAW way...

This will be an attempt to document stuff I’ve done in the past. I’m bad at documenting, so I’ll just present what I’ve done. If you have further questions, always feel free to email me.

This time I wanted to know what’s on my DVDs. I mean, not what’s normally visible, but what’s underneath the data layer. Contrary to CDs, where a lot of work has been done to allow reading every bit of a CD, there is surprisingly less information for DVDs.

While DVDs seem to be similar to CDs, there are many structural differences in the way the data is encoded. I’ll spare you the facts - in doubt, you want to read any of the (free) Ecma documents on DVDs (for example Ecma 268). They explain pretty much how DVD data is encoded on disc. But to sum it up: Before the data is written to DVD, it goes trough the following stages:

  • addition of header data (ID,CPR.MAI,…)
  • EDC calculation
  • energery dispersal (scrambling)
  • ECC (PI/PO)
  • interleaving, adding sync information
  • EFM+ encoding
  • conversion to analog signal to drive burn laser for pits&lands
CD readers often have special modes to read raw sectors. This is probably related to the fact that you need some of these functions to digitally read out audio data from audio CDs, but they can also be used to explore CD-ROMs. In the DVD-domain, we are not that lucky. Most of the signal processing is done in hardware, and recent drives are single-chip chipsets, with one chip doing all the work, from analog RF to IDE (or SCSI). Sometimes firmware allows you reading 2064 bytes per sector, sometimes you can disable the EDC check or scrambling, but usually, you cannot go further. Sometimes you can query PI/PO stats, but that’s all.

I wanted more. My attempt was to capture the data before it gets processed by the usual circuits, but I didn’t wanted to mess around with analog hardware. So I grabbed an old Pioneer DVD-113, which luckily was build in those good days before single-chip solutions were available:

overview.jpg

It is also interesting how little part of the device is controlled by what we know as “the drive firmware”. In fact, it’s just the host interface, and the programming of all the other chips. No real data processing is done in the firmware - the CPU would be way to slow anyway.

After some scoping, I’ve found out that between the MN67703AC and the MN103007BDA there was a 4bit synchronous bus which carried data. I didn’t knew what kind of data it was (except that it came from DVD - when I slowed down the DVD spindle, the clock also slowed down). I’ve captured the data with my logic analyzer, and ran some statistics over it. I found out that in some data segments, every 372 (4bit) samples a specific pattern showed up. I’ve digged in the ecma docs, and found the “physical sector layout”. Voila, sync marks are at a (32+1456)=372*4 bit distance.

That made me very happy - the data at this port was before EFM+ decoding, and just after the analog processing (EQ, slicing, clock recovery - all those things I didn’t have a clue about).

Getting these data into my PC was easy: I’ve attached my good, old Cypress FX2 board, and wrote some small firmware (fx2.zip) and PC application (streamer.zip), both based on Cypress examples. It was really easy, and I can stream about 15MB/s without problems. The data rate of course depends on the DVD reading speed.

When the drive is idle (i.e. not seeking), it will just follow the data track on the disc, decoding all the data and trying to keep on track by evaluating the optical sensors and moving the laser with the coils (and possibly the whole pickup). This makes the head following the track “automatically”. However, it will constantly seek-back in order to stay on (approximately) the same sector. If you watch the pickup carefully, you will notice that. Because ideally, we want to stream the whole DVD in one go, without any seeks, we need to disable this seeking. I’ve did this in a brute force way: I’ve shortcut the I2C-bus to the servo controller. Shortcutting for a small moment is enough - the controller will give up issuing seek commands, and you can move the pickup manually (don’t worry - although it will stress the tracking logic a bit because it tries to compensate at first, this should be no real problem). Just move the head with your fingers to the position where you want to stream (for example, to the beginning), and let it idle. It will follow the track, streaming out all data.

Finally I wrote a tool which processes the received data stream and does all the data processing steps (sync, EFM+, deinterleave, ECC, scrambling, EDC) and outputs plain data (decode.zip). It might need some additional tweaks here and there (especially the sync handling is really not nice, and a lot of the decoding errors could be fixed by improving the sync logic). It uses Phil Karn FEC library. The software also contains code to properly handle Gamecube optical discs, and also code to automatically calculate non-standard scrambling seeds based on the EDC value. The software outputs 2064 bytes/sector frames, so you don’t gain much out-of-the-box. However you can explore the data at any point. Take my software just as an example. One thing, for example, which you can never do with just a firmware hack is to properly decode twin sectors (two sectors with the same PSN but different content), or even twin sector traps (different “read paths” depending on where you seeked last - for example you could make sector 0x31000 follow 0x30FFF0, and another sector 0x31000 (with different content) follow 0x30F00. Now, when you seek to one sector, and then to 0x31000, it will seek to either the first or second version of 0x31000, depending on what the drive read last. With lineary reading the DVD from start to end, you can extract these kinds of constructs.

The method of attaching a parallel data output somehow reminds me of what hackers did with the C64 floppy drive…

If you want to build your own debug DVD reader - well, start with finding the proper DVD-ROM. I’m sure there are a lot of (older) DVD-ROMs which have the right data ports. Take a scope, and watch for digital data. An easy way to tell if this data comes directly from DVD is to slow down the disc a bit (with your finger :) - only a bit, the drive needs to keep in sync! -, and watch if the data rate changes. If it does, chances are big that you found the right data. It should be approx. twice the payload data rate.

You could use any Cypress FX2 board, or of course invent your own capture device, for example with an FPGA. Speaking of an FPGA, if you manage to write the decoding logic for an FPGA, you could display the sector number (PSN) in realtime, so you can watch where you go when moving the head. My software is far too slow to decode the data in realtime. Another interesting thing is that you could output a trigger signal, to watch a specific sector RF data on the scope. This would allow exploring some DVD-based copy protection schemes, or doing timing measurements. Partially, this can be done by adding a “revolution signal” (which gives a logic hi one time per revolution) to the streamed data, so we can recover the angular position with the drive. You could then derive the physical position of a sector on the drive, which is also used in some copy protection schemes. (Disclaimer: Of course we never want to work around them. We just want to design even better schemes.)

If you go with the Pioneer drive, here you can (almost) see where to look for the signals:

data.jpg i2c.jpg

You might want to use better cables than I did, though. And be sure to properly cool down the spindle power switches at the top.