Oldskooler Ramblings

the unlikely child born of the home computer wars

Archive for the ‘Programming’ Category

Upcoming Trixter Sighting + Retroprogramming

Posted by Trixter on May 15, 2013

June 14-16 you should be able to see me at @party in Boston.  (If anyone is looking to share a hotel room, drop me a line!)  I am scheduled to give a 30-minute version of the PCjr presentation I had worked on for NOTACON, and hopefully enter a compo or two with some oldskool hardware I will be shipping to arrive ahead of me.

Speaking of entering a compo: I really, really dig retroprogramming.  The cool part is, after 10 years of retroprogramming in spare time, my kung fu is getting advanced enough where I not only know how to do what I want in assembly, but I know the fastest possible method to getting it done on the target hardware. For example, I recently implemented a vertical-retrace interrupt in software because the hardware one wasn’t good enough. It’s sick that I know how to do that, but sicker that I know why I need to do that.

I still get a kick out of impressing Jim-of-20-years-ago.

Posted in Demoscene, Programming | 9 Comments »

LZ4 on the 8088: One small drop

Posted by Trixter on February 9, 2013

I thought I had squeezed every drop of blood from the stone that is LZ4 decompression on the 8088, but with some help from Peter Ferrie and Terje Mathisen, we’ve managed to improve the decompression speed by another 1%.  1% may seem laughable, but believe me, it’s quite an accomplishment if you followed my previous 3-part series on optimizing for the 8088.

In addition to even faster code, I thought it would be interesting to see how small an LZ4 decompressor could get, so with Peter’s help we managed to come up with a version of the code that trades speed for size.  It’s 30% slower on average, but it compiles to only 78 bytes.

The downloads section of the LZ4_8088 website has been updated to contain both versions in the single .zip file.

Posted in Programming, Vintage Computing | Tagged: | Leave a Comment »

Optimizing for the 8088 and 8086 CPU, Part 3: A Case Study In Speed

Posted by Trixter on January 18, 2013

In this final part of 8088 optimization posts, I present a case study for 8088 optimization. Specifically, I’ll cover a problem that I needed to solve, how I solved it, how long it took to optimize my solution for 8088, and what the performance benefit was from that optimization.

For the TL;DR crowd who will likely skip to the end of the article: Through clever 8088 optimization and a thorough understanding of our problem, a decompression routine was created that not only beats all known methods previously created for 8088, but can actually exceed the speed of a memcopy given the right input. But to see how this was achieved, you’ll have to grab a snack and settle down for 10 minutes.

Now, on with our case study.

Read the rest of this entry »

Posted in Programming, Vintage Computing | 22 Comments »

Optimizing for the 8088 and 8086 CPU: Part 2

Posted by Trixter on January 11, 2013

Welcome back to our little crash course on how to optimize code for maximum speed on the 8088 and 8086 CPU. Let’s jump right back in with a quick way to transmogrify the contents of a buffer.

Read the rest of this entry »

Posted in Programming, Vintage Computing | 11 Comments »

Optimizing for the 8088 and 8086 CPU: Part 1

Posted by Trixter on January 10, 2013

There is a small but slowly growing hobby around retroprogramming for old PCs and compatibles. This hobby has existed for decades for other platforms, as evidenced by the active demoscenes on each retro platform, but the IBM PC (and other 4.77MHz 8088 compatibles) has only recently started to gain that same sort of attention. As a public service to the 8088 retroprogramming community — “All four of you, huh?” — I’ve decided to write a crash-course on optimizing your code for maximum speed on the 8088. This information is targeted to people who already know either modern x86 assembly or assembly for other CPUs, and are programming for the 8088 or 8086 for the first time (or the first time in a long while).

Read the rest of this entry »

Posted in Programming, Vintage Computing | 28 Comments »

Maze Generation In Thirteen Bytes

Posted by Trixter on December 17, 2012

Update 12/7/2012 @ 13:46: Peter Ferrie smashed my record by a single byte, so the record is now held by him at 12 bytes.  Congrats, and I feel like a fool for missing it :-)  I’ve tacked on his optimization to the end of my original post.

Update 1/7/2013: herm1t further smashed the record, down to 11 bytes!

Update 1/7/2013 @ 18:00: Peter bounces back and reclaims the record with 10 bytes! It kind-of breaks my target platform (uses an undocumented opcode that only works on Intel processors) but hey, a record’s a record! I’ve updated the article below.

In the past, when I’ve had a democoding breakthrough, I kept quiet and either used my discovery in a production, or just bragged to my demoscene friends privately.  However, my opportunities to achieve democoding “world firsts” are just about gone, and size coding compos seem to be dead, so I’ve decided to just write a blog post about what I’ve done instead: I’ve written a maze generator in only 13 bytes of x86 machine code.

Read the rest of this entry »

Posted in Demoscene, Programming, Vintage Computing | 29 Comments »

Reverse-engineering an old wound

Posted by Trixter on November 8, 2012

Nearly two decades ago on the usenet newsgroups comp.sys.ibm.pc.demos and comp.sys.ibm.pc.soundcard, there were some accusations flung around that Josh Jensen (Cyberstrike of Renaissance, for those who still remember the PC demoscene) had copied entire chunks of Mark J. Cox‘s MODPLAY to use in his own modplayer SuperProPlay (and later MASI sound system). Just as time has a way of healing old wounds, advances in technology has a way of ripping them open again, and a chance encounter with some familiar assembly code in October got me thinking about the accusations against Jensen all those years ago. I didn’t give it much attention back then, but I’m a different person now, with much more skill than I had 20 years ago. With decades of x86 assembler, reverse-engineering, and programming skills under my belt, I decided to take another look at this issue to see if it could be answered definitively. I armed myself with much better RE tools (IDA) as well as Josh’s released Protracker Playing Source (PPS) v1.10 source code (PPS110.ZIP) and spent about an hour looking at them both.

My verdict: Josh quite absolutely copied entire chunks of MODPLAY for use in his own code.

Read the rest of this entry »

Posted in Demoscene, Programming | 2 Comments »

Family Computing

Posted by Trixter on November 22, 2011

Today’s post over on Vintage Computing and Gaming’s Retro Scan Of The Week covered the magazine Family Computing, one of the lesser-known personal computing magazines of the 1980s, which brought back a memory that I think is important to share.  Normally I’d write a lot of historical info about Family Computing Magazine itself, but not today.  This post is less about Family Computing and more about how a simple choice my father made shaped my life.

In 1983, having started using the Apple IIs at my school for word processing and simple programming with LOGO, I became quite interested in computers and really wanted one, but our family didn’t have a lot of money at the time and couldn’t afford one, even a C64. My father was sympathetic to how I felt, and as a small consolation bought me a subscription to Family Computing Magazine. It turned out that the magazine subscription was just as valuable a gift as the computer I wanted. Whenever it arrived, I read it cover to cover in 2-3 hours, absorbing everything in that magazine and learning about every system on the market as well as what kinds of software and hardware were available for them.  More importantly, I also learned what other people were using their computers to accomplish, far beyond a simple checkbook balance or playing a game.  And for those specialized tasks, they were often writing their own software in BASIC.

That’s a nice memory, but not a life-changing one.  What changed my life, specifically, was the combination of three things:  My desire to use a computer + not actually owning one + the BASIC listings in every Family Computing magazine.  Every mag had a few BASIC programs that did various things, usually a utility program, a simple game, and some “mystery” program that displayed or printed some graphic or message and you had to run the code to see what it was.  They were written in Applesoft BASIC, with diffs for other computers of the time (usually Atari 8-bit, C64, TRS-80, and TI 99/4A were represented, with later diffs for Spectrum and PCjr’s sound and graphics).  Because we didn’t own a computer, I would spend hours tracing through the BASIC listings in my head to “run” them to see what they did.  Sometimes I had a pad next to me to jot down notes, as I couldn’t juggle more than 5-6 variables at a time. For the “mystery” programs that output graphics, I would plot the output on graph paper.  Each program was a puzzle to solve.  My brain became an emulator.

Dad saw me spend hours reading each magazine, and going over older ones, so he found a way to save monthly for a computer.  A little over a year later, he surprised the family with an AT&T PC 6300, which he was able to get at a discount because he worked at AT&T at the time. I nearly exploded, and barreled through that machine with a purpose.  I used that computer just as long as I read Family Computing, both until roughly 1989.

Today, I program in 8088 assembler for fun.  It calms me down.

Thank you, Joey Latimer, for writing all those BASIC programs, and thank you Dad, for a simple act of empathy.

Posted in Programming, Vintage Computing | 1 Comment »

No keyboard, no monitor, no problem

Posted by Trixter on September 29, 2011

I have a friend named Andrew Jenner.  If you’re intimately familiar with PC retrocomputing, you may remember him as the person who thought it would be a good idea to remaster an old game called Digger so that it could be recompiled for modern machines/languages/operating systems.  Meaning:  He took the original game binary, used DEBUG.COM to dump sections of it out as partially-assembled assembler source, and examined and tweaked it over several months until it could compile back into the original.  Then he translated that into C.  Then he made the C portable.  Then he made the C portable across operating systems.  Then he switched out the graphics for higher-resolution ones.  The end result is that you can now play this ancient game perfectly on any operating system, even in a Java VM.  His actions inspired similar projects by other people, like The Jumpman Project and The Beyond Castle Wolfenstein Project.  So that’s what Andrew does for fun.  At least, that’s one of the things he does for fun, when he’s not building new electronic music toys for his children, or writing a cycle-exact 8088 emulator, or just generally visiting every single hackerspace in a 200-mile radius to kick down the door and show them who’s boss.

He wrote me recently to let me know he had purchased an XT to do some democoding on it, a shared passion of ours.  It came with a monochrome card, but he lacked a suitable monitor; it also lacked a keyboard, and a working disk drive.  Did that stop him from using it?  Hell no, this is Andrew Fucking Jenner!  Step aside, son:

I ordered a CGA card but decided to see if I could jerry-rig something up in the meantime. I programmed my Arduino to pretend to be an XT keyboard and also the “manufacturing test device” that IBM used in their factories to load code onto the machine during early stage POST (it works by returning 65H instead of AAH in response to a keyboard reset). I then used this to reprogram the CRTC of the MDA to CGA frequencies (113 characters of 9 pixels at 16MHz pixel clock for 18 rows (14 displayed) of 14-scanline characters plus an extra 10 scanlines for a total of 262 scanlines). The sources for this are on github.

I had to re-read that a few times to make sure I wasn’t having a seizure.  Let’s confirm what happened:

  1. With no input device or working disk drive, he still managed to load code by reprogramming a microcontroller to emulate a long-forgotten IBM diagnostic protocol, formerly used only in factories by test devices to QA units before they went out the door.
  2. The code he loaded was to force a monochrome card to output NTSC signals, so that could be connected to a TV.  Not dramatic enough for you?  How about this:  He forced a monochrome card to behave like a color card.
  3. He made the schematic and source code available, because that’s the kind of guy he is.

You don’t mess with Jenner.  You do read his blog, however.

Posted in Programming, Vintage Computing | 3 Comments »

At a disadvantage

Posted by Trixter on June 4, 2011

Quick, without doing any research: What early 1980s computer was faster, the IBM PC or the Commodore 64? The IBM PC ran an 8088 at nearly 5MHz, whereas the C64 ran a 6502 variant at 1MHz. The PC cost thousands of dollars, the C64 hundreds. The PC had a 1 megabyte address space; the C64 only 64K. Is this a trick question?

It is!  The C64 was faster.  The original IBM PC, despite appearances and bias on the part of both consumers and marketing, was actually the slowest popular personal computer on the market at the time of its release, even compared to the Apple II and Atari 400.  Here’s why.

The 8088 holds an uncomfortable position between the realm of 8-bit and 16-bit personal computing; while the internal word size was indeed 16-bit, the 8 in 8088 means that its external data bus was only 8 bits wide.  This means that the 8088 could only access one byte of data in a single bus operation, giving it speeds much more like an 8-bit personal computer than a 16-bit one. Normally this is no big deal; the 6502 used in the C64 had the same limitation.  But unlike the 6502, which could access a byte in a single cycle, the 8088 took 4 cycles to access that same byte.  Another way of looking at this: every time memory is touched, the 8088 wastes 75% of its cycles, effectively turning the IBM PC from a 4.77MHz computer into a 1.1925MHz computer.  This gave it a “lead” of only 0.1695 MHz over the C64.

If it still had a slight lead, then why was it slower?  While the 8088 could indeed operate on 16 bits at a time, the machine instructions were between 2-4 bytes large, and only the simplest instructions took 2 cycles to execute.  Contrast that with the 6502, where most instructions are 1 byte large and most execute in 1 cycle.

Let’s illustrate this with a fun example:  Rotating a byte of memory once using ROR (rotate right). We’ll keep it fair by treating the PC like it only has a single 64K segment of memory. First, the 6502 version using ROR:

Cycle Operation
1 fetch opcode, increment program counter
2 fetch low byte of address, increment program counter
3 fetch high byte of address, increment program counter
4 read from effective address
5 write value back and do operation
6 write the new value to the effective address

6 cycles. Now the 8088 version:

Cycle Operation
1 ROR BYTE PTR [1234],1 expands to “D0 0E 34 12” so let’s get to fetching the opcode:
2 (still fetching…)
3 (still fetching…)
4 (still fetching…)
5 (still fetching…)
6 (still fetching…)
7 (still fetching…)
8 (still fetching…)
9 Fetch lowbyte of address
10 (still fetching…)
11 (still fetching…)
12 (still fetching…)
13 Fetch hibyte of address)
14 (still fetching…)
15 (still fetching…)
16 (still fetching…)
17 Perform operation, which takes 15 cycles + EA calculation (6)
37 Final cycle of calculation, we’re done, yay :-/

What took 6 cycles on the C64 takes 37 cycles on the IBM PC, no thanks to the slow memory access of 4 cycles per byte. Taking both machine’s clock speeds into account, this means the operation takes about 6 microseconds on the C64 and about 8 microseconds on the IBM PC.  It can get much worse than that, especially if you’re foolish enough to access more than a single 64K memory segment.  IBM PC is teh suck! (*)

The gap between the IBM PC and the Atari 400 is even wider, if you can believe that, because the Atari 400 ran the 6502 faster (1.78MHz) than the C64 (1.026 MHz).  The BBC Micro?  2MHz!  It’s painful to think about!

Ever wonder why there hasn’t been a true demoscene demo on the original IBM PC aside from three scrollers (all Sorcerers releases, btw)? Well, now you know one major reason. (Lack of decent graphics is another; in fact, I’d be willing to argue that only the Apple II had slower graphics.)

(*)Yes, I know the 8088 has 4-byte prefetch queue that sometimes speeds things up.  That comes in handy, oh, almost never.

Posted in Demoscene, Programming, Uncategorized, Vintage Computing | 37 Comments »