At a disadvantage

June 2011
S	M	T	W	T	F	S
	1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Posted by Trixter on June 4, 2011

Quick, without doing any research: What early 1980s computer was faster, the IBM PC or the Commodore 64? The IBM PC ran an 8088 at nearly 5MHz, whereas the C64 ran a 6502 variant at 1MHz. The PC cost thousands of dollars, the C64 hundreds. The PC had a 1 megabyte address space; the C64 only 64K. Is this a trick question?

It is! The C64 was faster. The original IBM PC, despite appearances and bias on the part of both consumers and marketing, was actually the slowest popular personal computer on the market at the time of its release, even compared to the Apple II and Atari 400. Here’s why.

The 8088 holds an uncomfortable position between the realm of 8-bit and 16-bit personal computing; while the internal word size was indeed 16-bit, the 8 in 8088 means that its external data bus was only 8 bits wide. This means that the 8088 could only access one byte of data in a single bus operation, giving it speeds much more like an 8-bit personal computer than a 16-bit one. Normally this is no big deal; the 6502 used in the C64 had the same limitation. But unlike the 6502, which could access a byte in a single cycle, the 8088 took 4 cycles to access that same byte. Another way of looking at this: every time memory is touched, the 8088 wastes 75% of its cycles, effectively turning the IBM PC from a 4.77MHz computer into a 1.1925MHz computer. This gave it a “lead” of only 0.1695 MHz over the C64.

If it still had a slight lead, then why was it slower? While the 8088 could indeed operate on 16 bits at a time, the machine instructions were between 2-4 bytes large, and only the simplest instructions took 2 cycles to execute. Contrast that with the 6502, where most instructions are 1 byte large and most execute in 1 cycle.

Let’s illustrate this with a fun example: Rotating a byte of memory once using ROR (rotate right). We’ll keep it fair by treating the PC like it only has a single 64K segment of memory. First, the 6502 version using ROR:

Cycle	Operation
1	fetch opcode, increment program counter
2	fetch low byte of address, increment program counter
3	fetch high byte of address, increment program counter
4	read from effective address
5	write value back and do operation
6	write the new value to the effective address

6 cycles. Now the 8088 version:

Cycle	Operation
1	ROR BYTE PTR [1234],1 expands to “D0 0E 34 12” so let’s get to fetching the opcode:
2	(still fetching…)
3	(still fetching…)
4	(still fetching…)
5	(still fetching…)
6	(still fetching…)
7	(still fetching…)
8	(still fetching…)
9	Fetch lowbyte of address
10	(still fetching…)
11	(still fetching…)
12	(still fetching…)
13	Fetch hibyte of address)
14	(still fetching…)
15	(still fetching…)
16	(still fetching…)
17	Perform operation, which takes 15 cycles + EA calculation (6)
…	…
37	Final cycle of calculation, we’re done, yay :-/

What took 6 cycles on the C64 takes 37 cycles on the IBM PC, no thanks to the slow memory access of 4 cycles per byte. Taking both machine’s clock speeds into account, this means the operation takes about 6 microseconds on the C64 and about 8 microseconds on the IBM PC. It can get much worse than that, especially if you’re foolish enough to access more than a single 64K memory segment. IBM PC is teh suck! (*)

The gap between the IBM PC and the Atari 400 is even wider, if you can believe that, because the Atari 400 ran the 6502 faster (1.78MHz) than the C64 (1.026 MHz). The BBC Micro? 2MHz! It’s painful to think about!

Ever wonder why there hasn’t been a true demoscene demo on the original IBM PC aside from three scrollers (all Sorcerers releases, btw)? Well, now you know one major reason. (Lack of decent graphics is another; in fact, I’d be willing to argue that only the Apple II had slower graphics.)

(*)Yes, I know the 8088 has 4-byte prefetch queue that sometimes speeds things up. That comes in handy, oh, almost never.

This entry was posted on June 4, 2011 at 9:34 pm and is filed under Demoscene, Programming, Uncategorized, Vintage Computing. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

43 Responses to “At a disadvantage”

Andrew Jenner said

June 5, 2011 at 8:48 am
I wouldn’t say that the prefetch queue is almost never useful – it does speed up most code by a factor of almost 2: based on the counts I’ve done, the bus interface unit stalls about as much as the execution unit does. Although the 8088’s execution unit takes longer to do most things, it works in parallel to the bus so I think overall it’s actually faster than a 6502 running at 1/4 the clock speed.

The 8088 does also have some other advantages over the 6502: it has more registers, a more compact instruction encoding, multiplies and divides are faster, and once you’ve got 16-bit quantities into registers, they can be manipulated faster.

We should have a race with some non-trivial code. In fact, I will take it as a challenge – if you give me a 6502 inner loop that doesn’t rely on other C64-specific hardware like the VIC-II or SID, I bet I can beat it with 8088 code that running at 14/3 times the clock speed.

Reply
- Trixter said
  
  June 5, 2011 at 11:45 am
  My 6502 knowledge is tenuous at best, so I will leave it to a 6502 hacker to come up with such a code race.
  
  (I knew this post would get your attention ;-)
  
  Reply
  - Tomer Gabel said
    
    June 6, 2011 at 9:48 am
    It might be worthwhile to get in touch with modern-day demosceners who write demos for older platforms. They probably have some seriously tight code for certain effects which can be rewritten for another architecture.
    
    Reply
- landondyer said
  
  June 5, 2011 at 7:05 pm
  You can view the 6502 as having three really fast registers, and 256 somewhat slower ones (in page zero).
  
  On the other hand, you could write a better compiler for the x86 (16-bit registers, 16-bit stack, etc.), and the x86 had direct access to much more memory. The 6502 eventually lost out from a standpoint of architecture. The 65816 was a late comer that didn’t gain much market traction (all due respect to the Apple IIgs, the writing was on the wall inside Apple after the Mac became successful).
  
  The “world record” clock speed for a 6502 is about 25Mhz (told to me by Leonard Tramiel, who tried this in a lab at Commodore in the early 80s).
  
  Reply
  - VƎЯREVグレェgreyREVVƎЯ (@rhymebyter) said
    
    June 18, 2014 at 2:46 am
    The 65816 (or more specifically, the 5A22, based upon a derivation from the 65c816) actually saw a rather bright history in the スーパーファミコン or Super Nintendo, as it was known in most countries outside of Japan, running at 3.58 MHz.
    
    That said, compared to the slightly older PC-Engine, which had a different derived 6502 variant (namely, the HuC6820 running at 1.79 MHz or 7.16 MHz), it is remarkable to see how long such derivations lasted, well into the mid 1990s in console gaming.
    
    The PC-Engine, in particular, was a marvel of simplicity both in terms of physical size, no moving parts (or even LED for power) it had a very low power consumption, played games at high frame rates, with very faithful arcade ports (on some occasions, even better), and it was the harbinger of the CD-ROM.
    
    That said, today, flashcard devkit type devices for it may be procured for around $70, which can address up to 32GB of flash on a uSD; not only is it possible to load every title ever created for the system with plenty of room left over, I posit that it should be possible to create a boot loader for the various CD-ROM games to access such data from flash, rather than requiring a CD-ROM attachment at all. Albeit, legalities may be questionable, but undoubtedly the speed and economics of such a platform may seem vastly more appealing today than in an era where the CD-ROM was more of a novelty, and a means to thwart piracy. (That said, this vintage of CD-ROM more or less side steps things like ISO9660 and even duplicating such discs is a bit of a chore to understate it).
    
    There is one outfit even producing a *new* “HuCard” facsimile type type production run of a new game. http://www.aetherbyte.com
    
    I know that if I were going to target *any* 6502 variant these days, that would be the platform of my choice, the console itself is about the size of three or four CD jewel cases, the HuCard slot is comparable to the size of a credit card. It makes the PS4 and XBOX360 look like giant noisy beasts.
    
    And with merely 8 KiB of main RAM and 32KiB of VRAM, it takes a bit of coding nuance to contend with such constraints in this era; even with a conceptual ability to load in a devcard with 32GB of memory, good luck figuring out how to address all of that sanely with an 8bit CPU! ^_-
    
    I would like to think that while there are some retro sceners providing awareness to pushing old hardware to its limits, we may even see some sort of die-shrink SuperGrafx on a HuCard implementation (the SuperGrafx was perhaps the biggest flop in console history, with fewer than 10 titles ever officially released for it: it had 32KiB of main memory and 128KiB of VRAM; and arguably the best port of Ghouls ‘n Ghosts short of the X68000 [which was, not coincidentally, Capcom’s CPS-1 dev platform]).
    
    I see strong potential for creating a vintage perpetuation of software for that platform, in large part not just because it was so small, and power efficient, but also constrained, making it an interesting challenge for contemporary programmers who are presently accustomed to *gigabytes* of RAM and terabytes of disk, in a laptop no less!
    
    Not to mention, the PC-Engine has been emulated well enough to see commercial implementations of virtualized hardware officially running on platforms like the Wii, indicating that there is a strong likelihood for it being a useful target for even selling things through electronic distribution channels.
    
    That said, one could probably write some universal ROM loader to burn to a CD-R for those who do have the CD-ROM attachment for the unit; ostensibly for owners with an Arcade Card (which raised the total RAM of the system up to about 2MB) one could load just about every title ever released for the platform on HuCard into memory off of a CD.
    
    Not that anyone would necessarily condone such actions, but then, I doubt anyone would complain either. ;)
    
    Compared to looking at running old 8088 software on a contemporary console? Even games that would conceivably be easily emulated, often expected QWERTY keyboard access, there is the matter of the cumbersome BIOS, and ridiculous interrupts. There were many reasons *other* than budget that scene people gravitated towards platforms that were not the PC, they were simply better. Simpler, more powerful architectures, and that is still evidenced, decades later, despite Trixter’s fantastic demos showcasing the hardware in ways that IBM and clone manufacturers never conceived of.
    
    It’s a shame really, in 1984 there was the Amiga Boing Ball demo, and by 1986 Eric Graham’s “Juggler” ray traced HAM rendered animation was lighting up RGB CRTs in store displays, *nothing* on an Apple, IBM, or clone compared. And you wonder why sceners gravitated that way?
    
    As others mentioned later in this thread, it was not until 1994 or later that the PC, usually a 486 class system equipped with a GUS even began to approach the level of graphic or audio fidelity that tickled the fancies of such individuals, and by then, some of the more fortunate (or fortuned) were already dabbling with SGIs and other beasts of inordinate price.
    
    But for vintage? If you want to go with a 6502 lineage, a PC-Engine (or revisions like the Core Grafx) is about as sweet as it gets, I am pretty sure if I read the pinouts correctly, most models would even allow one to bypass the CPU altogether and just tap the video bus, y’know, if you wanted to say, release a HuCard SoC that had some stupendously fast contemporary system onboard and just use the system as a glorified docking station for the A/V outs and the serial interface for a controller (or maybe keyboard and mouse).
    
    I have considered such things, but was disappointed to find that Galileo’s SoC is a bit too large for such a project; give it another 5-10 years and a few more die shrinks, and who knows what can be done. For the time being, a similar vibe of project could probably be accomplished with an NES form factor and a Galileo, but I admit, I have not bothered to look at the pinouts for that system, it is far too chop shop, has a horrid region lock out chip that causes more video errors on legitimate titles than it ever managed to thwart non-licensed software; and Nintendo is still too big of a gorilla or Donkey Kong in the world to really lock horns with in my opinion.
    
    NEC by comparison, exited the game market altogether; but then, keeping in mind that they were a company of 50 in that era, considering what outfits like Google and VMWare do with tens of thousands of employees, I would say that they did far more, with far less, than we seem to manage today with far more.
    
    Long live vintage computing!
    
    Well, some of it at least, much of it is better left forgotten; even today, there is far too much contemporary code which is beyond bloated and unnecessary. I have made the analogy that if people got rid of instruments as quickly as we have abandoned other technology, guitars would need to be excavated in archeological expeditions.
    
    I would much rather see a longer life for *certain* technologies which are far more sophisticated, even if perhaps slower, than an instrument which *requires* a player. After all, such platforms may be 20-30 years old, but they are still Turing complete. Something a drum or rattle can never claim.
    
    Reply
- kevinm said
  
  June 7, 2011 at 10:08 am
  The 6502 memory access was also well though out. I designed several products that took advantage of the fact that the CPU did not access the RAM for 50% of each cycle of the Phi2 signal. That allowed two CPUs to share access to the same RAM which resulted in some very powerful little computers at very low cost.
  
  Reply
Wer war schneller? — Retro said

June 5, 2011 at 10:46 am
[…] hier gibts die Begründung […]

Reply
John | Retro Programming said

June 5, 2011 at 4:07 pm
The penalty to access memory makes little difference when you consider there are 8 16-bit registers available, more than adequate for most calculations. Try writing a simple loop to calculate factorials on the 6502 and 8088. I’m willing to bet the 8088 will be considerably faster :-)

Reply
- Trixter said
  
  June 5, 2011 at 8:26 pm
  There may be eight 16-bit registers, but that doesn’t mean they’re all general-purpose and magical. Only four of them can be accessed by their low- and high-bytes (AX,BX,CX,DX), only some can be used as index pointers (BX,BP), only some can be used as segment registers, etc. A former kernel developer referred to the x86 register set as Larry, Moe, and Curly because you could only really count on AX, BX, and DX to be useful (CX used as the counting register).
  
  The point of the post was to generate discussion, and I’m most glad it has :-D
  
  Reply
  - Covoxer said
    
    August 14, 2011 at 12:54 pm
    No matter what purpose they are, they are all useful, and 6502 has none. What 6502 has for a loop counter? For source and destination addresses in memory copying? You only have 8 bit indexes, and if you want to any location in memory with them, you have to use indirect 0 page with offset. Which means 6502 fetches both 16 bit addresses and their 8 bit addresses in 0 page, every time you copy one byte. You have also to fetch the counter location (8 bits) and codes for every single operation in this not so small loop. In contrast, 8088 does not fetch anything except the source data and then stores it. It doesn’t even fetch any operation codes if you use rep movs.
    
    Reply
george obien said

June 5, 2011 at 6:04 pm
Fortunately, in a fairly short time, Compaq came out with an 8MHz 8086 (16bit bus) that was much faster.

Reply
- Trixter said
  
  June 5, 2011 at 8:28 pm
  I agree; the 8086 could access two bytes in 4 cycles, which levels the playing field a bit.
  
  Reply
morgan said

June 6, 2011 at 4:53 am
That explains why my Amiga 500 from 1986 ran faster than the PC I owned a decade later.

There is also the fact that commodore could program software that (usually) worked also (unlike one bastardised US monopolist company.

Reply
MichaelEdits said

June 6, 2011 at 6:03 am
Is THAT why I owned a Commodore 64? I thought it was because I was broke. :-)

Reply
Steve said

June 6, 2011 at 9:33 am
Ah, nice to see a mention of the good old BBC Micro! It surprised me how there was no demo scene built up around this platform though.

I guess it was related to its status as an education machine, and the relatively high price tag at the time compared to other 8-bit computers. I do remember a few intros on the occasional piece of cracked software but nothing like on the C64.

Reply
Top Posts — WordPress.com said

June 6, 2011 at 6:02 pm
[…] At a disadvantage Quick, without doing any research: What early 1980s computer was faster, the IBM PC or the Commodore 64? The IBM PC […] […]

Reply
Trixter said

June 6, 2011 at 8:43 pm
So the people over at reddit are roasting me alive, but it’s awesome because it’s chock full of the information I was hoping people would write:

Reply
Show #6 (July 2011): User groups, Beautiful Boot, KFest memories, and game tournaments | Open Apple said

July 15, 2011 at 7:11 am
[…] 6502 vs. 8088 […]

Reply
The 555 footstool | Apple II Bits said

July 18, 2011 at 9:12 am
[…] Apple II popularized many processors and chips, most notably the 6502. But as a games machine, the Apple II relied heavily on an unsung hero: the 555, a timer IC that […]

Reply
Covoxer said

August 14, 2011 at 3:00 am
This “theory” is flawed. Try to implement more practical thing. For example copying parts of memory or more complicated heavy loops and you’ll see the difference.
It’s hard to say how much faster 8088 actually was because all this depends greatly on what your code is doing. Working with few bytes is one thing, working with large array of floating point values is quite different.

Nevertheless, since the processor power was (an still is) most important for gaming, why not to compare the gaming results? Look at the Elite game. It was originally made for 2 Mhz 6502 and later ported both to C64 and PC (8088 with CGA). The video hardware of the C64 was not helpful with accelerating Elite’s graphics, so C64 was mostly relying on CPU for drawing game scene. In 2 MHz 6502 BBC Micro we have a flickering wire frame graphics. In “crappy” 8088 we have a non flickering filled polygons. Now, in your theory, the 2 Mhz 6502 would tear 8088 apart and berry it alive. Especially since this game does not have any floating point and few 16 bit calculations, being almost completely 8 bit. But somehow, 8088 manages to draw about 10 times more pixels and redrawing entire screen instead of separate objects on every frame. How can this be possible? ;) 6502 was too slow to be able to draw filled polygons in real time. In fact, taking this game for a benchmark, we can see that 6502 was about as fast as 3 MHz Z80 in ZX-Spectrum and about 5-10 slower than 4.77 Mhz 8088 in IBM PC.

Reply
- Trixter said
  
  August 14, 2011 at 11:13 am
  I don’t think you’ve worked with an actual 4.77MHz 8088; it sounds like your results are based on an emulator or a 286. I just ran Elite right now on my 8088 and the opening rotating screen (shaded, to be consistent with your claim) was 4.1 fps. The line-drawn version’s opening screen updates at 6.2 fps. Both the C64 and BBC’s versions are drastically faster than that (I’m using http://www.youtube.com/watch?v=4lKKy3l_5YI and http://www.youtube.com/watch?v=y3xHj0plhDU as references).
  
  I also don’t get where you think the 8088 is drawing “10 times more pixels”. If you mean because it is updated the entire screen, that’s because it draws to a hidden buffer and then moves that entire buffer to video memory. In retrospect, it might have been faster if it drew directly to video memory, which would give PC elite the same flickering as other versions (bad) but would have improved the speed (good).
  
  Reply
  - Covoxer said
    
    August 14, 2011 at 12:44 pm
    I was playing Elite on 8088 (XT clone) in late 80-s. So it’s from my experience. ;) Not sure about FPS as I was not measuring it (obviously). But it was quite playable. And 4.1 doesn’t sound like it is. But even if you are right (we had lower FPS expectations back then) the fact that we have filled polygons instead of wireframe drawings is still the same.
    
    >> I also don’t get where you think the 8088 is drawing “10 times more pixels”.
    That’s because it draws filled polygons instead of wireframes. Count the number of pixels in the wireframe rectangle and filled rectangle. For example, 50x50px wireframe rectangle has only 200 pixels. While filled one has 2500.
    
    >>If you mean because it is updated the entire screen, that’s because it draws to a hidden buffer
    Yes, this is true as well. In C64 version entire frame is never redrawn or copied (which is a time consuming operation). Instead it erases the lines it had drawn on previous frame by drawing them again with the black color. This way, to redraw entire frame you only need to redraw as many pixels as there are in all the lines drawn. And that is very few. While PC version redraws (erases, copies, fills etc.) every pixel in the frame.
    
    So, basically, what I’m saying is that wireframe rendering is many times faster. And drawing directly to the frame buffer avoids copying entire frame every time. But 8088 was able to do this, and 6502 was not. Also keep in mind that Elite was originally written for 6502. If there was a slightest possibility to have filled polygons on BBC Micro, they would most certainly implement it.
    
    Reply
Covoxer said

August 15, 2011 at 2:28 am
Ok, let’s take your ROR example to the more realistic conditions. Let’s say we have to rotate an array of bytes somewhere in the memory. The address and length is provided.

6502
LDA ($addr), Y ; 5
ROR A ; 2
STA ($addr), Y ; 6
INY ; 2
DEX ; 2
BNE .loop ; 3
Total 20 clocks per loop. At 1 MHz the loop takes 20 microseconds.

8088
LODSB ; 16
ROR AL ; 2
STOSB ; 11
LOOP .loop ; 17
Total 46 clocks. At 4.77 Mhz the loop takes 9.6 microseconds.
So 8088 is over twice faster. Isn’t it? ;)

Now some more useful and common example. Copying an array of bytes (may be a string). Not using 16 bit advantage here.

6502
LDA ($src), Y ; 5
STA ($dest), Y ; 6
INY ; 2
DEX ; 2
BNE .loop ; 3
Total 18 clocks per loop. Which is 18 microseconds at 1 MHz.

8088
rep movsb ; 17
Total 17 clocks per loop. At 4.77 MHz the loop takes 3.5 microseconds.
In this case 8088 is over 5 times faster than 6502. :)

Now another common time consuming operation. The 32 bit multiplication. Quite useful for floating point mantissa calculations for example (assuming there’s no 8087 plugged in ;) ).

6502 (multiplication loop alone)
ASL ($R1) ; 5
ROL ($R2) ; 5
ROL ($R3) ; 5
ROL ($R4) ; 5
ASL ($X1) ; 5
ROL ($X2) ; 5
ROL ($X3) ; 5
ROL ($X4) ; 5
BCC .skip ; 3
CLC ; 2
LDA ($R1) ; 5
ADC ($Y1) ; 3
STA ($R1) ; 6
LDA ($R2) ; 5
ADC ($Y2) ; 3
STA ($R2) ; 6
LDA ($R3) ; 5
ADC ($Y3) ; 3
STA ($R3) ; 6
LDA ($R4) ; 5
ADC ($Y4) ; 3
STA ($R4) ; 6
.skip:
DEX ; 2
BNE .loop ; 3
Assuming that we have the same number of 1’s and 0’s in our agruments, the average execution time will be:
32*(43+(56/2)+5) = 2432
Which is 2432 microseconds at 1 MHz.

8088
MOV AX, [$L2]
MOV DI, AX
MOV AX, [$L1]
MOV SI, AX
MUL AX, SI
MOV [$L3], AX
MOV BX, DX
MOV AX, [$H2]
MUL AX, SI
ADD BX, AX
MOV AX, [$H1]
MUL AX, DI
ADD BX, AX
MOV [$H3], BX
With all instruction fetch stalls, this code takes up to 540 clocks to execute. Which is about 110 microseconds at 4.77 MHz.
So it’s about 22 times quicker than 6502. :)

As you can see, in realistic heavy loops, 8088 is usually many times quicker. I can provide more examples if you want. ;)

Reply
- Trixter said
  
  August 16, 2011 at 8:45 pm
  Well, it’s not *quite* accurate because you’re making the same mistake I wrote the entire post about: Most 808x instruction opcodes are 2-4 bytes large, and they themselves take 4 cycles per byte to read on 8088. So maybe you’d like to adjust your timings? :-)
  
  You are correct in stating that it depends on workload. There are some operations that the 8088 can definitely do faster than 6502 at comparable clock speeds. I just wanted to correct the misconception that it was always 5x faster or some other silly notion.
  
  Reply
  - Covoxer said
    
    August 17, 2011 at 12:51 am
    No, it is quite accurate and I’m not making any mistakes here. I was accounting for the instruction fetch stalls in all three estimations. Your mistake here is that you assume that “most” instructions are long and instruction queue is never used. Now go ahead, and analyze any of these samples step by step if you don’t believe me.
    For example, in the first (bytes shifting) example, we have LODSB and STOSB instructions, each 1 byte long. So after the first LODSB there’s enough data in the queue for the 2 byte ROR AL, 1 to be executed without a stall. Following STOSB refills the queue.
    The REP MOVSB does run in 17 clocks per loop rate. No problems here since it only needs 8 clocks for memory transfer.
    Now, I shall not analyze entire multiplication code (it’s a bit large ;) ), but as I said in the original message, that is an estimated timing including instruction fetch stalls. Besides, being 20 times faster, a dozen of clocks makes really no difference here. The real advantage comes from the microcoded multiplication.
    
    Re: 2nd paragraph.
    Yes, except that 8088 goes faster not in “some”, but in most, especially in processor heavy, operations. There’s nothing heavy in altering a dozen of variables once per frame for example. Most of the processor time is usually spent in repeating operations over the large data arrays. For example, graphics in games. 8088 is much better suited for these as you can see (the copying is not an exception here). Another thing that can bring CPU on it’s knees is a heavy math with complicated floating point calculations. That’s why my third example. Without any doubt, 8088 will do many times better here. So what operations are there, that can make 8088 feel sluggish while 6502 smooth? Actually I can’t think of any. Can you?
    
    Now, about 5x times faster. Surely one can’t make such comparisons with absolutely different architectures. But I see you claiming explicitly that 8088 at 4.77 MHz is “slower” than 6502 at 1 MHz. Which is not what you are saying now. ;)
    
    Reply
Covoxer said

August 15, 2011 at 11:49 am
Now let’s compare 6502 to the proper competitor – a presumably sluggish i8080.

First code example with shifting array of bytes looks like this:
MOV A, M ; 7
RRC ; 4
MOV M, A ; 7
INX HL ; 5
DCR C ; 5
JNZ .loop ; 10
Total 38 clocks. At 2.5 MHz this loop takes 15.2 microseconds. Surprise, surprise! It’s about 25% quicker than 1 Mhz 6502. :)

Let’s check the string copying:
8080
MOV A, M ; 7
STAX DE ; 7
INX HL ; 5
INX DE ; 5
DCR C ; 5
JNZ .loop ; 10
This takes 39 clocks. Which is 15.6 microseconds at 2.5 MHz. Again quicker than 18 microseconds for 1 MHz 6502.
So, it looks that 8080 at 2.5 MHz is doing these operations faster than 6502. And you say 6502 is faster than 8088 at 4.77 MHz. So does it mean that 2.5 MHz 8080 is much faster than 4.77 MHz 8088? ;)

The third example is quite long so I’ll skip the code. But it takes about 3500 microseconds for 2.5 Mhz 8080. Which is quite slower than 6502.
So in average, 1 MHz 6502 has about the same performance as 2.5 MHz 8080, being slower in some operations and faster in others.
But neither is a match for 8088. :)

Reply
- Trixter said
  
  August 16, 2011 at 8:50 pm
  Funny you should mention the 8080 vs. 6502 debate: I directed Ian Oliver’s attention to this post and its comments (he assisted in the porting of Elite to DOS when he worked at Realtime Games) and he had this to say:
  
  (quote begin)
  Wow, that’s a really old argument, but we usually did the “1MHz 6503 versus 3.5MHz Z80” version.
  
  The 6502’s real ace in the hole was the zero page, which could make up for the sucky lack of registers, however, the Z80 and 8086 (and even 8088) had some 16 bits registers, with 16-bit register ops, which really helped for a lot of things. The 8088 machines also had more memory, which meant you could use larger maths/lookup tables, and unroll loops more.
  
  Overall, I’d put the 6502 at the back of the pack for the stuff we were doing, but at least it had an excuse, whereas the 8088 was a cost-cutting exercise too far.
  (quote end)
  
  Reply
  - Covoxer said
    
    August 17, 2011 at 2:48 am
    Thanks for Ian’s opinion! :)
    Actually, I can hardly imagine Elite running on 2.5 MHz 8080. That’s for sure. :D
    
    But I’m not so sure about the cost cutting taken too far in PC. You see, it was made first of all for business applications and was to compete with CP/M business machines of the time. With 16 bit memory bus, it would be more expensive (or less profitable) than competition. And 8088 was without a doubt much more powerful solution than Z80 for business applications. Yet it resulted in comparable price hardware. So it was quite good choice it seems. Please note, that business applications do have different requirements to CPU than games of the era. Floating point performance was important (unlike Elite), and not only 8088 was better at this, there was an option for the 8087 (who would need it in a home gaming machine?). They also require lots of RAM and quick RAM management (copying etc), and no other 8 bit bus CPU of the era (except for 68008 ;) ) could allow efficient access to over 64 Kb (again not a problem for Elite and other games). Yet another important thing was an efficient implementation of the high level language compilers (6502 was especially bad at this, being really efficient only while programmed on assembly). And powerful operating systems (again 6502 was bad at this as it was not easy to share 0 page among multiple tasks for example). So, the cost cutting was taken far enough to provide minicomputer/workstation-like functionality to the desktop CP/M computers price level. Would it be just a bit faster but twice more expensive, it could lose the game… Perhaps that would be better in the historical perspective, but as any design it was made to win and in this respect it was quite successful. ;) And cost cutting was a deciding factor here.
    
    Reply
Covoxer said

August 16, 2011 at 2:37 am
Another interesting observation is that 1MHz 6502 was about as quick as 1.78 MHz Z80 used in the 1977 TRS-80.

Z80 code for the shifts example:
RRC (HL) ; 15
INC HL ; 6
DJNZ .loop ; 13
34 clocks per loop. At 1.78 MHz this takes about 19 microseconds. Just a bit quicker than 20 for 6502.

Z80 string copy is just a single instruction:
LDIR ; 21
It’s 21 clocks per loop. Which is about 12 microseconds at 1.78 Mhz. About 1.3 times faster than 18 microseconds for 1 MHz 6502.

32 bit multiplication on Z80 may look as follows. Please note that all data is held within CPU registers during the loop.
ADD IY, IY ; 15
ADD IX, IX ; 15
JNC .s2 ; 10
INC IY ; 10
.s2:
ADD HL, HL ; 11
EX DE, HL ; 4
ADC HL, HL ; 15
EX DE, HL ; 4
JNC .s3 ; 10
EXX ; 4
ADD IX, BC ; 15
JNC .s1 ; 10
INC IY ; 10
.s1:
ADD IY, DE ; 15
EXX ; 4
.s3:
DJNZ .loop ; 13
This takes about 4100 clocks per one operation in average. That is 2300 microseconds at 1.78 MHz. Just a tiny bit slower than 2100 for 1 Mhz 6502.
In overall though, the 1 MHz 6502 is about as quick as 1.78 MHz Z80. Which is about as quick as 2.5 MHz i8080.

So returning to the trick question about the fastest personal computer of early 80’s, it seems that your choice is merely as fast as 1977 TRS-80. :)

Reply
Dag said

September 5, 2011 at 10:51 pm
But which one of these CPUs / Computers still have a demoscene in 2011 ? ;-)

greets Bug

Reply
- Trixter said
  
  September 5, 2011 at 11:42 pm
  Ouch! Touche!
  
  The IBM PC — the original PC, or any clone based on 8088/8086 — never had a true demoscene. There were two Sourcerers demos that ran on a stock PC, and a third (Atom) that needed a “turbo” (7.16MHz or better) 8086 to run, and maybe GR8 by Future Crew but that’s about it, pretty much every PC demo has needed 286 or higher.
  
  I hope to remedy this someday.
  
  Reply
  - Scali said
    
    May 4, 2015 at 4:52 am
    “I hope to remedy this someday.”
    
    Apparently :)
    
    Reply
- Covoxer said
  
  September 8, 2011 at 5:37 am
  That’s because PC evolved quickly, while there was never a more powerful computer compatible with C64. So, all the C64 demoscene stuck to it, while PC hackers moved on to new models. ;)
  
  Reply
  - Scali said
    
    December 19, 2011 at 9:21 am
    I don’t agree with that.
    The first PC I got was still an 8088 with CGA, albeit at 9.54 MHz, and that was in 1988 or so.
    PCs were horribly expensive compared to C64, Amiga etc (that PC was still 4 times as expensive as an Amiga 500!).
    As a result, people kept buying the older, more low-end models.
    So the 8088 remained popular for many years, and as a result, most games were aimed at this platform.
    By 1990, we finally saw games requiring faster 286, and slowly but surely 386 (mostly the cheaper 386SX) started to gain traction, and some games started to use 32-bit mode.
    The 486 wasn’t really commonplace until 1993 or so (it was introduced in 1989). Around 1992/1993 you saw a big boom in the demoscene, as more and more sceners started using 486es (also ex-C64/Amiga/etc).
    But prior to 1992, the PC demoscene was virtually non-existent. I don’t think there were any demos at all before 1988 or so, and only a handful released up to the 486-boom.
    
    As Trixter says: there WAS no demoscene.
    
    Reply
    - benjamc72842db224 said
      
      April 11, 2024 at 9:08 am
      I realise that I am replying to a 13 year old post but as this blog post gets referenced quite a bit in the age old 80 cpus debate I thought it was worthwhile adding some information here.
      
      First of all PCs aimed at the home market in the 80s and 90s were far cheaper than people seem to realise. Late 1986/ early 1987 Issues of PC and Byte Magazines have advertisements for 8mhz 8088 turbo XT clones selling for as little as $425 (256k ram & CGA) or $465 with 640K ram. For comparison the 1987 Sears catalogue lists the C64c with a 1541c disk drive at $419.98 and the C128 with a C128 with a 1571 disk drive at $599.98. Turbo XT clones were very competitively priced especially considering the ram, cpu speeds and cheap storage options they were offering. Keep in mind the Amiga 500 only started to really sell in 1989, by which time clone 286s were selling for similar prices.
      
      Secondly the home market for PCs only really started in around 1984 and it took a couple of years before the PC was taken seriously as a games platform. While games during the CGA era (1981-1986) were all developed for the 4.77mhz 8088, during the EGA era (1986-1990) more and more games were aimed at 286s. A fast turbo XT could still run most games but especially towards the end of the 80s, a 286 was needed to run all of them at full speed. Soon after the VGA era (1990-1993) started to see 486 become an affordable home option with 286s & 386s becoming a minimum requirement, while the SVGA era (1993-1998) saw pentiums everywhere with 486s moving to the new budget machine position. Basically once there was a legitimate home market, PCs evolved far too quickly for a demo scene to form. It took 3 years for the first C64 demos to appear and for PCs that would mean a new generation of tech would already be starting to show up.
      
      Reply
The five-year upgrade plan | frans goes blog said

December 28, 2011 at 9:28 am
[…] the original 8088 based PC was actually slower than the C64, even while the PC run at nearly 5 MHz: https://trixter.oldskool.org/2011/06/04/at-a-disadvantage/.) Yes, the PC games are not necessarily “better” today (but they look fancier, and are […]

Reply
8088 MPH: How it came about | Scali's OpenBlog™ said

April 12, 2015 at 12:12 pm
[…] 4.77 MHz CPU, and the C64 has a 1 MHz CPU. This is a case of the MHz myth, and Trixter has already covered that in an earlier blog. The short version is that the DRAM modules used in all early 1980s microcomputers are more or less […]

Reply
scr888 cuci said

December 23, 2017 at 7:35 pm
scr888 game hack

At a disadvantage

Reply
1 Minute Weight Loss Guide.1 Minute Weight Loss Review said

November 17, 2019 at 1:42 am
1 Minute Weight Loss Guide.1 Minute Weight Loss Review

blog topic

Reply
After Effects & Performance. Part 6: Begun, the core wars have… said

December 22, 2019 at 8:43 am
[…] to use several instructions to do the same thing. On a broader level, the way different chips interfaced with memory and other parts of the computer also influenced how fast a computer would work as a whole. […]

Reply
Z80 vs. 8088 Traipse – TOP Show HN said

November 20, 2023 at 4:06 am
[…] Now not correct that: https://trixter.oldskool.org/2011/06/04/at-a-disadvantage/ […]

Reply
Z80 vs. 8088 Tempo – TOP HACKER™ said

November 20, 2023 at 4:33 am
[…] No longer ideal that: https://trixter.oldskool.org/2011/06/04/at-a-quandary/ […]

Reply
Z80 vs. 8088 Speed – Teknologi AI said

November 20, 2023 at 5:39 am
[…] Not just that: https://trixter.oldskool.org/2011/06/04/at-a-disadvantage/ […]

Reply

	Matthew Garrett: Wha… on 8088 MPH: We Break All Your…
	The Incredible Demo… on 8088 MPH: We Break All Your…
	Trixter on 8088 MPH: We Break All Your…
	wh0phd on 8088 MPH: We Break All Your…
	John Olson on Cyberpunx

Oldskooler Ramblings

the unlikely child born of the home computer wars

Recent Posts

Recent Comments

Pages

Meta

Top Posts

Archives

Blog Stats

At a disadvantage

Share this:

Related

43 Responses to “At a disadvantage”

Andrew Jenner said

Trixter said

Tomer Gabel said

landondyer said

VƎЯREVグレェgreyREVVƎЯ (@rhymebyter) said

kevinm said

Wer war schneller? — Retro said

John | Retro Programming said

Trixter said

Covoxer said

george obien said

Trixter said

morgan said

MichaelEdits said

Steve said

Top Posts — WordPress.com said

Trixter said

Show #6 (July 2011): User groups, Beautiful Boot, KFest memories, and game tournaments | Open Apple said

The 555 footstool | Apple II Bits said

Covoxer said

Trixter said

Covoxer said

Covoxer said

Trixter said

Covoxer said

Covoxer said

Trixter said

Covoxer said

Covoxer said

Dag said

Trixter said

Scali said

Covoxer said

Scali said

benjamc72842db224 said

The five-year upgrade plan | frans goes blog said

8088 MPH: How it came about | Scali's OpenBlog™ said

scr888 cuci said

1 Minute Weight Loss Guide.1 Minute Weight Loss Review said

After Effects & Performance. Part 6: Begun, the core wars have… said

Z80 vs. 8088 Traipse – TOP Show HN said

Z80 vs. 8088 Tempo – TOP HACKER™ said

Z80 vs. 8088 Speed – Teknologi AI said

Leave a comment Cancel reply