At a disadvantage
Posted by Trixter on June 4, 2011
Quick, without doing any research: What early 1980s computer was faster, the IBM PC or the Commodore 64? The IBM PC ran an 8088 at nearly 5MHz, whereas the C64 ran a 6502 variant at 1MHz. The PC cost thousands of dollars, the C64 hundreds. The PC had a 1 megabyte address space; the C64 only 64K. Is this a trick question?
It is! The C64 was faster. The original IBM PC, despite appearances and bias on the part of both consumers and marketing, was actually the slowest popular personal computer on the market at the time of its release, even compared to the Apple II and Atari 400. Here’s why.
The 8088 holds an uncomfortable position between the realm of 8-bit and 16-bit personal computing; while the internal word size was indeed 16-bit, the 8 in 8088 means that its external data bus was only 8 bits wide. This means that the 8088 could only access one byte of data in a single bus operation, giving it speeds much more like an 8-bit personal computer than a 16-bit one. Normally this is no big deal; the 6502 used in the C64 had the same limitation. But unlike the 6502, which could access a byte in a single cycle, the 8088 took 4 cycles to access that same byte. Another way of looking at this: every time memory is touched, the 8088 wastes 75% of its cycles, effectively turning the IBM PC from a 4.77MHz computer into a 1.1925MHz computer. This gave it a “lead” of only 0.1695 MHz over the C64.
If it still had a slight lead, then why was it slower? While the 8088 could indeed operate on 16 bits at a time, the machine instructions were between 2-4 bytes large, and only the simplest instructions took 2 cycles to execute. Contrast that with the 6502, where most instructions are 1 byte large and most execute in 1 cycle.
Let’s illustrate this with a fun example: Rotating a byte of memory once using ROR (rotate right). We’ll keep it fair by treating the PC like it only has a single 64K segment of memory. First, the 6502 version using ROR:
|1||fetch opcode, increment program counter|
|2||fetch low byte of address, increment program counter|
|3||fetch high byte of address, increment program counter|
|4||read from effective address|
|5||write value back and do operation|
|6||write the new value to the effective address|
6 cycles. Now the 8088 version:
|1||ROR BYTE PTR ,1 expands to “D0 0E 34 12″ so let’s get to fetching the opcode:|
|9||Fetch lowbyte of address|
|13||Fetch hibyte of address)|
|17||Perform operation, which takes 15 cycles + EA calculation (6)|
|37||Final cycle of calculation, we’re done, yay :-/|
What took 6 cycles on the C64 takes 37 cycles on the IBM PC, no thanks to the slow memory access of 4 cycles per byte. Taking both machine’s clock speeds into account, this means the operation takes about 6 microseconds on the C64 and about 8 microseconds on the IBM PC. It can get much worse than that, especially if you’re foolish enough to access more than a single 64K memory segment. IBM PC is teh suck! (*)
The gap between the IBM PC and the Atari 400 is even wider, if you can believe that, because the Atari 400 ran the 6502 faster (1.78MHz) than the C64 (1.026 MHz). The BBC Micro? 2MHz! It’s painful to think about!
Ever wonder why there hasn’t been a true demoscene demo on the original IBM PC aside from three scrollers (all Sorcerers releases, btw)? Well, now you know one major reason. (Lack of decent graphics is another; in fact, I’d be willing to argue that only the Apple II had slower graphics.)
(*)Yes, I know the 8088 has 4-byte prefetch queue that sometimes speeds things up. That comes in handy, oh, almost never.