Reverse-engineering an old wound
Posted by Trixter on November 8, 2012
Nearly two decades ago on the usenet newsgroups comp.sys.ibm.pc.demos and comp.sys.ibm.pc.soundcard, there were some accusations flung around that Josh Jensen (Cyberstrike of Renaissance, for those who still remember the PC demoscene) had copied entire chunks of Mark J. Cox‘s MODPLAY to use in his own modplayer SuperProPlay (and later MASI sound system). Just as time has a way of healing old wounds, advances in technology has a way of ripping them open again, and a chance encounter with some familiar assembly code in October got me thinking about the accusations against Jensen all those years ago. I didn’t give it much attention back then, but I’m a different person now, with much more skill than I had 20 years ago. With decades of x86 assembler, reverse-engineering, and programming skills under my belt, I decided to take another look at this issue to see if it could be answered definitively. I armed myself with much better RE tools (IDA) as well as Josh’s released Protracker Playing Source (PPS) v1.10 source code (PPS110.ZIP) and spent about an hour looking at them both.
My verdict: Josh quite absolutely copied entire chunks of MODPLAY for use in his own code.
When accused, Josh’s paraphrased explanation at the time was “it’s a modplayer, of course some things are going to be the same from player to player”, but that only makes sense at a high level. Yes, the basics of playing a mod are the same across all players, such as interpreting the data structures and effects, mixing four channels into a single output channel, etc. But the devil is in the details, and it is the details that point to copying. The code is not 100% identical at an assembler level, but there are some very unique choices Mark made in the original MODPLAY that mysteriously show up in Josh’s source, such as internal housekeeping, and the inner mixing loop.
The mixing loop, I have found, is a good “fingerprint” for a modplayer — almost every author implements it in a different way. There are bare-metal fastest-possible implementation loops (such as the self- modifying fixed-length code of Carlo‘s Galaxy Player), loops optimized for low memory usage (such as MODPLAY’s loop which uses a MUL in the inner loop), loops that trade memory for speed (such as TANTRAKR which uses a 128K lookup table to eliminate MUL), and all targets inbetween. 4 channels or N channels? 32-bit mixing or 16-bit mixing? Logarithmic or linear volume tables? Cubic interpolation or linear interpolation? Just about every x86 mixing modplayer is different. And the choice Mark Cox made — utilizing a MUL in the inner loop and making heavy use of memory variables — was because he knew his target was a 286 or later and could handle it. You can also tell from the MODPLAY disassembly that Mark was working in a vacuum, because his performance-sensitive code is nowhere near as optimized as it could be (sorry Mark!). Looking at Jensen’s source, you can see exactly the same methods at play, including the inner loop (although Jensen made a few tiny 1- and 2-opcode optimization changes here and there).
As much as I love the inner loop as a fingerprint, the most convincing evidence that copying occurred is actually in the most boring sections of both programs: General housekeeping (things like program startup/initialization, maintaining player state, etc.) Mark does something in MODPLAY that struck me as odd; he calls two tiny procedures to set some variables based on whether or not a mod has 15 or 31 instruments (labels are mine; I don’t have access to Mark’s source code):
sub_1E23 proc near mov sequence_offset, 1D8h mov word_124, 258h mov header_size, 258h mov num_inst, 0Fh retn sub_1E23 endp sub_1E3C proc near mov sequence_offset, 3B8h mov word_124, 438h mov header_size, 43Ch mov num_inst, 1Fh retn sub_1E3C endp
That’s a weird way to set some vars. You don’t normally call a tiny procedure simply to set a handful of memory variables to fixed values; usually, you just set the values directly. I don’t know Mark’s motivation for doing it this way. This is a very unusual section of code that I wouldn’t expect to see again…
…and yet, Jensen follows the very same odd practice in PPM.ASM:
proc sd_Set15Ins uses ds mov ax,@data mov ds,ax mov [Word NumberInstruments],15 mov [Word SequenceOffset],01D8h mov [Word HeaderSize],0258h ret endp sd_Set15Ins proc sd_Set31Ins uses ds mov ax,@data mov ds,ax mov [Word NumberInstruments],31 mov [Word SequenceOffset],03B8h mov [Word HeaderSize],043Ch ret endp sd_Set31Ins
Again, it’s not the exact instructions that are copied or their order, it’s that entire concepts were copied, and because Mark implemented them in a unique way, they stand out in Jensen’s code.
As someone who has done a lot of cracking and reverse-engineering of vintage software — including, I’m ashamed to say, outright theft of other people’s code — I sense other subtle touches in Jensen’s released source that indicate large sections of it are not his original work. The most obvious are switching between hex and decimal values as a basic notation from procedure to procedure; the copied chunks favor hexidecimal notation, while the original code favors decimal. Also, all throughout the code some lines are indented using 8-character tab stops while other lines use spaces for padding, which is indicative of generating a file using one padding style and then editing it using another style, which would not typically happen if you wrote all the code from scratch.
In the last 20 years, Jensen has remained a professional programmer, and just as my skills and integrity have increased over that time, I have no doubt that his have increased as well. It is not my intention to libel Jensen as a whole; I simply wish to set the record straight regarding only one of his claims.