MPEG-2 Encoding, The OCD Way
Posted by Trixter on September 17, 2011
Remember that AssemblyTV advert that said we would ship MindCandy 3 in September? We won’t, because we found several bugs in the first release candidate that we’re fixing. In one instance, a Samsung BD-P1000 (a very early player) wouldn’t even get to the main menu! So we’re going back and doing more compatibility fixes and testing, and adding a few missing features along the way (like pop-up menus for the NVScene talks). We should be shipping in October.
However, with the Blu-ray build taking several hours at a time to test changes, I have some time to concentrate on the DVD (and write blog posts). Let’s talk about MPEG-2 encoders.
Ever wonder what the very best MPEG-2 encoder is? Each have different strengths, parameters, quantization matrices, sensitivity to noise, and so on. There’s no way to see which one is best for your source footage until you try one. So you may be tickled to know that we tested pretty much every single Windows encoder that someone claimed to produce decent results (with the exception of ProCoder, which kept crashing on my rig, which is a shame since I recall it produced great output). A few months ago, I prepared the then-current 480i version of the main timeline, a thoroughly interlaced 3h29m27s 30i video. I made an avisynth wrapper for it that presented itself as a YV12 colorspace format (what MPEG-2 uses) and fed it to several encoders, each set to the same average bitrate (4.7 mbit/s) and set to DVD-compliant settings. I then ran the encoded results through the MSU Video Quality Measurement Tool and concentrated on five metrics:
- SSIM (the metric x264 uses)
- 3SSIM (a modified version of SSIM)
- VQM (a metric that exploits the DCT to simulate human visual perception)
- PSNR (older, depreciated)
- The color results of PSNR (U and V components) since the previous four only looked at luminance (Y)
Finally, I took the average metric score for all 376642 frames, stuck everything in a spreadsheet, and color-coded each from green (most similar to the original input file) to red (farthest):
|Encoder||Notable Configuration Parameters||Average SSIM (Y)||Average 3SSIM (Y)||Average VQM (Y)||Average PNSR (Y)||Average PSNR (UV)||Average PNSR (U)||Average PNSR (V)|
|CCE SP3||CBR 4.7mbit/s (intentionally bad)||0.9562||0.95988||1.14252||34.36616||39.05231063||38.9542||38.53384|
|Adobe Media Encoder CS5||Quality 5, max render depth||0.96163||0.96772||1.00816||36.35463||39.09635214||39.8727||39.42966|
|CCE SP3||default settings||0.96511||0.96949||1.01146||36.45595||39.00388083||39.41969||38.96111|
|HcEnc 0.26||9-bit DC, defaults||0.96355||0.97021||0.98358||36.63967||38.966577||39.57915||39.15758|
|CCE SP3||10-bit DC, CG1 matrix, no filters, Q16||0.96428||0.97053||1.03151||36.72093||38.86613||39.32893||38.8701|
|CCE SP3||9-bit DC, CG1 matrix, no filters, Q16||0.96446||0.97075||1.02869||36.75945||38.788335||39.33256||38.87439|
|TMPGEnc 5||9-bit DC, defaults||0.96324||0.97108||0.9676||36.93709||38.630765||39.53456||39.12228|
|QuEnc 0.72||9-bit DC, all quality settings on (slow)||0.96679||0.97581||0.9007||38.36719||37.93311||38.11264||37.75358|
Some interesting things can be noted from these results (keep in mind that my source is a noise-free, digitally clean, computer-generated video):
- Adobe Media Encoder, which uses MainConcept’s engine, clearly uses PSNR as its comparison metric when optimizing 2-pass encodes (unfortunately, PSNR is not a good metric to optimize to, which is why it doesn’t do well in the metrics that actually matter like SSIM and 3SSIM, and why it looks worse visually)
- CinemaCraft readily sacrifices color accuracy during a CBR encode, presumably to fit the target bitrate better and try to preserve as much luminance as it can.
- CinemaCraft’s default settings produce, for it, the best SSIM metric. All attempts to make it better by me (10-bit DC precision vs. 9-bit, different matrices, different filter settings, etc.) actually made it worse.
The subjective viewing quality of each of these results was mostly in-line with the technical results with one exception: The QuEnc output was noticably worse than its SSIM score above would suggest. It’s hard to explain without showing individual frames as comparison, but there was just something in QuEnc’s output that made it feel worse to the viewer than the others. The PSNR metrics confirm that somewhat. I think I must have made a mistake somewhere along the way with my testing of the QuEnc encoder, so I mentally ignored it when making my comparisons.
So which one did we end up using? To understand that, you should understand my motivation. I have a history of using psychology in most of my projects to gain a slight edge with my target audience wherever I can: I chose nerd-familiar material for 8088 Corruption, I played virt’s rickroll composition during my presentation of MONOTONE, the contribution point reward system in MobyGames was my idea, etc. So while TMPGEnc produced the best overall results in subjective user observations across the entire video, we went with CinemaCraft SP. Why? CinemaCraft SP allows you to skew the bitrate for any number of user-defined sections. I used this feature to ensure perfect visual quality for the very first and last demo in the timeline. Start strong and finish strong.