Oldskooler Ramblings

the unlikely child born of the home computer wars

8088 MPH: We Break All Your Emulators

Posted by Trixter on April 7, 2015

One of my bucket list items since I read my first party report back in 1991 was to attend a european demoparty and compete in a compo.  I competed at NAID ’96 and placed there, which was awesome, but to compete with the best of the best, and win, has always been a dream of mine.  I’m happy to announce that after six months of hard work with good friends and extremely talented people, we achieved just that.  Our demo, 8088 MPH, won the Revision 2015 oldskool demo compo.  (A personal victory for me was having it shown last in the compo, which is a sign of respect that the organizers think it’s the best high to end a compo in.) As of April 7th 2015, there are no IBM PC emulators in the world that can run the demo properly; they hang or crash before the demo is finished, and the colors are wrong.  Same goes for anything that isn’t the target hardware (see below).  To see what 8088 MPH looks like, I direct you to the video+audio capture of the demo running on real hardware: Because there are so many technological world-firsts in the demo, and because we’re bending the hardware in ways that people have never thought to do so, it’s only fair that we try to explain exactly how this was achieved.  One of my roles was “organizer” for the demo, so I’ll break it down scene by scene, covering the basics of each trick.  For parts that I wrote, I go into some detail, but for a deep technical dive into certain areas, I’ll keep this blog entry updated to point to reenigne’s, VileR’s, and Scali’s blog posts about their parts.  It is our hope that these discussions will foster revived “old-school” interest in software development for the platform. After you read this summary post, please visit the following links by us that break down, in-depth, specific sections of the demo:

And for more general info:

Target Hardware Specifications

Before going into each part, let’s define what the target system was for the demo:  A 1981 IBM 5150 (aka the very first “IBM PC”) with 640 KB RAM, a floppy drive, IBM CGA card, and internal speaker.  That setup consists of:

  • 4.77 MHz 8088 CPU.  5 MHz seems like a lot compared to other 8-bit micros, but it takes the CPU 4 cycles to read a single byte.  So, compared to other 8-bit CPUs like the 6502 or 6809, which can read a byte in one clock cycle, the effective clock speed of the 8088 is more like (4.77 / 4) = 1.19 MHz.
  • Video adapter that has a 9-pin RGBI interface and an RCA NTSC composite video interface.  Driven by a Motorola 6845 character generator.  No facilities for changing text characters; font is fixed.
  • Internal speaker; no sound card like the Sound Blaster, or special sound hardware like the C64 SID.  Speaker can be driven by a timer pin to produce a square wave, or can be bit-banged directly via a port or a one-shot timer.

The 640KB RAM requirement seems steep, but not only was it possible to add that to the first IBM PCs, by 1985 it was quite common.  If you still want to cry foul, then please note the only effect that uses just about all of that RAM is the kefrens bars, so that the repeating pattern would take longer to repeat and be more pleasing to the eye.  We could have reduced it, but then you might have noticed the pattern repeating quicker.  With the kefrens bars effect, the demo uses 507 KB RAM; without it, the demo uses 349 KB.  Most effects use much less, and some are tiny, like the plasma which uses only 6KB (which includes the banner graphics) and the picture of the girl which uses 18K (2K more than the size of the raw picture data itself).  We intentionally traded size for speed, which was a deliberate decision to fit as many effects as we could into 8 minutes running time, the compo limit.  If we had a few more minutes running time, we probably could have fit the entire demo into 256 KB or even less, but you would have waited longer between effects. I should also note here that there were two different versions of IBM CGA produced, which differ mainly in how composite colors are generated.  We had equal numbers of both “old” and “new” style IBM CGA cards, so we chose to compose graphics for the “old” style.  If you have the “new” style CGA card, the demo will still run, but the colors will be off slightly.

Technical Breakdown

Development tools used

  • Turbo C
  • Turbo Pascal
  • Turbo Assembler
  • Turbo Debugger
  • Visual C++
  • OpenWatcom
  • NASM (and YASM)
  • DOSBox
  • A few real IBM 5160s (hardware equivalent to the 5150, but easier to find in the real world)

Any data files were directly included in the .exe/.com files themselves.  This kept everything together in the same binary which means the data could benefit from compression (see below). Most development cycles involved designing in wetware, coding on a modern system (or DOSBox running on a modern system), testing/debugging in DOSBox, and then transferring over to real hardware for a final test.  Once an effect grew so sophisticated it couldn’t run in an emulator any more, this cycle slowed down as testing could only be done on real hardware.  Various transfer methods were used to get code to real hardware:  Scali used a serial cable; I used an ethernet card running a packet driver and mTCP; at the party we used an 8-bit IDE ISA adapter (Silicon Valley ADP-50) connected to a CF-to-IDE adapter to make a CF card the hard drive, then used a USB CF card reader to move stuff back and forth.  The most intriguing method of all was reenigne’s method, who used a custom controller connected to the keyboard port that used the IBM BIOS factory test mode as a poor-man’s serial port.  (I hope Andrew writes up some details on that!)

Loader, API, and general structure

We all had different preferred development languages and environments, so it was decided early on to create an overseeing “loader” that would execute .EXE and .COM files, and then people could develop effects in whatever environment they wanted to.  This is not a new concept; the famous Second Reality demo did this for exactly the same reasons, and the same technique was used even earlier than that on numerous demos on other platforms.  (Before you ask: No, the Second Reality code was not copied; in fact, it wasn’t even consulted, as we had to write extremely tight code to minimize memory usage, and also have it work on an 8088 (the Second Reality code uses 80186 opcodes).  The loader API services assemble to about 450 bytes of code. The loader, as designed, would be responsible for:

  • Playing music in the background
  • Masking the load times and precalc times of various effects using “megademo”-style text
  • Providing synchronization services (such as providing a vertical-retrace interrupt in software, and a user-definable countdown timer)

Running effects with the loader consisted of this workflow:

  1. Print text on the screen and animate it using an interrupt and the 6845 start address register
  2. Execute the effect
  3. The effect would decompress, perform precalc, etc. and then signal the loader it is ready to start
  4. The loader cleans up the moving onscreen text, then signal the effect it can start
  5. Effect starts, magic occurs

Designing this correctly was extremely important, as any bugs would derail the entire thing.  It was designed fully before even a single line of code was written.  I’ve shared the design doc online for the curious.  (I wrote the loader.) The background music playback had to be as simple as possible so as to not interfere with any effects.  A single PC beep, changing (or silencing) once every frame, was the only thing that was practical, so 60Hz beeping is what the background music consists of.  The composition program used for generating the speaker timer values was MONOTONE.  Even though the code for playback is only 18 lines of assembler, it takes up two scanlines onscreen, so you can see how anything even slightly more complicated would have sucked much more CPU out of the system and some of the full-screen 60Hz effects simply would not have been possible.

Executable compression

Another decision early on was to see if executable compression was feasible, which means the following:

  • Does it actually compress things small enough to be worthwhile?
  • Is the decompression quick enough to avoid long pauses in the demo?
  • Does the decompression routine affect the system while it decompresses? (ie. does it disable interrupts or something else equally horrible while it decompresses, which would trash the demo?)

I gathered most classic and modern executable compressors and ran tests against old programs that were representative of what we would be producing.  The results were welcome surprises.  The compression ratios were good enough that we could afford to include precalc’d data instead of calculating it on the fly, and the decompression was fast enough that the total end-to-end time loading a program from diskette was actually slightly faster than if it were to load uncompressed.  In the end, pklite emerged as the winner.  I’ve shared the comparison data online for comparison.  (If I missed any packers that hold substantial advantages over the ones in the data, please let me know what they are.  There were nearly 100 packers made for DOS, but unless they compress smaller than apack or upx, or decompress faster than pklite or lzexe — all while remaining compatible with 8088 — then I don’t want to hear about them.)

Scene-by-scene breakdown

What follows is a screen-by-screen explanation of each effect.  As previously stated, I’ll only describe scenes in detail if I wrote them; it will be up to the others if they want to write a technical breakdown for their parts.  The explanation for each effect follows after the effect’s screenshot. mph_screenhots.avi.Still001 The introduction was meant to serve two purposes:  To educate the audience on the system and explain at just how much of a disadvantage we were trying to make a world-class demo on such hardware, and also simultaneously shatter their expectations :-)  The text mode is obviously simulated; I essentially duplicated the basic BIOS functions for handling text mode but simulated in graphics mode.  The cursor blinking and text blinking are handled identically to how the 6845 does it, adding to the illusion. It is (nearly) impossible to change the display start address of graphics mode such that every single scanline comes from a different place, so the title screen unrolling was done brute force, by copying new scanlines into memory hidden by retrace.  The title screen goes away with a “fade” on the top edge by ANDing a mask on successive lines of the screen data. mph_screenhots.avi.Still002 A lot of people think the title screen is the same picture demonstrated by VileR a few years ago.  It’s not!  He recomposed it for 16-color composite specifically for this demo, and changed it subtlety as well. mph_screenhots.avi.Still003 The bobbing was achieved by creating a software vertical retrace interrupt that fired at the same place onscreen every time (just after the last displayed line) and then hooking it with a 6845 display start address change routine.  Flags were used to communicate to the interrupt if it was time to erase the letters, which was done by simply using REP STOSW to fill screen memory with black lines.  Because the 6845 displays two onscreen rows per “row”, the text could only move to even lines, which is why the movement isn’t as smooth as it could be.  Well, to be fair, it could be made to move to any line we wanted, but doing so would be CPU intensive, and the whole point of the loader is to use as little CPU as possible, so this was the compromise. The simulated vertical retrace interrupt was provided through loader API services for the rest of the effects to use as well.  Effects could disable it, re-initialize it, and hook/unhook their own routines to it. mph_screenhots.avi.Still004 The moire (interference pattern) effect was achieved using a base of 40×25 text mode, the half-char block extended ASCII characters, and lots of unrolled code.  The circles were chosen to represent the classic effect, but in reality the effect can combine any two images.  reenigne’s effect. mph_screenhots.avi.Still005 The rotozoomer is the same tired old routine I first rolled out in 1996 in the 8086 compo, but optimized to the hilt and sped up by only drawing every other line.  A miscommunication between me and VileR resulted in probably not the best texture to demonstrate the effect, but it still runs well enough.  There were plans to include a 60 Hz version of this effect, but we ran out of time. mph_screenhots.avi.Still006 The core concept of the 1024-color mode is a serious abuse of 80×25 text mode with the NTSC colorburst turned on.  VileR made the first discovery with 512 colors, and reenigne was able to double this to 1024 with CRTC trickery. Some people thought the entire demo was in this mode.  It was not, because 80-column text mode suffers from the famous CGA “snow” defect when you write directly to CGA RAM in this mode.  This is unfortunately visible in the plasma effect (see below). BTW, when I saw this picture in 2013, that’s when I knew I had to get all these people together to make a demo.  I mean, geezus, look at it!  My jaw dropped when I saw it.  Had I never seen VileR’s collaboration with reenigne to make the above, 8088 MPH might never have existed. mph_screenhots.avi.Still007 These stars were actually the result of unrolled code and a precalc’d table that, together, take a byte from one location and moves it to another position in video RAM.  While we had other patterns ready, such as a swirling display, we felt the starfield was most appropriate for a typical “oldskool” demo.  reenigne’s effect. mph_screenhots.avi.Still008 The sprite part seems like black magic, but is the combination of using a sprite compiler written by Scali, and adjusting the screen vertically using the 6845 start address register.  CGA only has one screen’s worth of video memory, so moving the address down scrolls the screen up, with the data repeating across the boundary.  The data doesn’t repeat evenly across the boundary, however, requiring handling.  The timer was monitored to know when the screen line containing the last pixel of the sprite had been drawn, which prompted redrawing the sprite.  (In other words, re-drawing the sprite was an exercise in racing the beam.)  Timing was very tight to avoid screen/sprite tearing effects. mph_screenhots.avi.Still009 Also part of the compiled sprite effect, this displays 30 vectorballs at 30 Hz.  We had an earlier display that used less balls to achieve 60 Hz, but Scali had the idea at the last minute to make them spell out something like “8088”, “IBM”, etc. and coded up the change at the party.  The update is done using double-buffering; the sprites only take up a small rectangular area onscreen, so the screen mode’s CRTC settings were reprogrammed to provide a video mode with a small area in the middle of the physical screen, using only half of available video memory.  This provided a true hidden page to draw/erase vectorballs to, which was then flipped to be visible using the 6845 display start address register. mph_screenhots.avi.Still010 Using a 1024-color variant screen mode that could be updated using only the attribute byte (thereby limiting the number of colors to 256), this plasma had to perform writes only when the CRT beam was retracing horizontally or vertically.  Unfortunately, the timing required to get this right stopped working at the party for some reason (probably happened as we were rearranging effect order), and as a result you can see a line of noise along the left side of the screen, and a little bit of noise at the top.  This was my fault, as I wrote the effect using a somewhat lazy polling routine.  It’s a shame CGA snow exists, because without all the retrace handling to avoid it, this effect runs at 60fps.  In the demo with snow avoidance, it runs at only 20fps.  VileR may write more about how this screen mode and color system is constructed, and if so, I’ll update the links at the top of this article to point to the method. If we come out with a final version of the demo, fixing this is at the top of the priority list.  In fact, I’m betting reenigne could change this from a polling effect to a cycle-counting effect, which would not only fix the snow, but speed it up. mph_screenhots.avi.Still011 The 1024-color mode reprograms the start address every two lines.  I took advantage of this behavior to create a simple “drip” effect for VileR’s amazing artwork.  Already you can posit that much more complicated effects are possible (thinking of the Copper demo here) but I ran out of time to make it more awesome. mph_screenhots.avi.Still012 This classic Kefrens bars effect was done by reenigne in 320x200x4 mode.  It’s a cycle-counting effect, as there is simply no time to monitor for horizontal retrace.  To ensure the cycle counting was consistent, several things were done including changing the system default DRAM refresh from it’s default interval of 18 to 19, to get the DRAM refresh periods to line up with CRTC accesses. mph_screenhots.avi.Still013 This was Scali’s effect and inspired by his 1991 demo which also featured a large torus.  There are several things going on here:

  • Only changed portions of the screen are calculated and drawn, to minimize the amount of bandwidth needed to update the screen (this is the same “delta drawing” idea used in XDC).  This was done because CGA video memory has a wait state, so the less you need to write to it, the better.
  • 320x200x4 mode is used with a background and palette combination that gives this specific composite color palette, which included many shades of blue.
  • To help with the shading, dithering is applied during rasterization.

mph_screenhots.avi.Still014 At the party, reenigne posited that it should be possible to restart the CRTC start address every single scanline.  This would result in a video mode that was only 100 lines high, and would give a 80×100 resolution 1024-color mode.  The above is the result of that coding, plus really extensive work done on a CGA NTSC composite signal modeling program done by reenigne months earlier to perform the image conversion.  (No, you can’t have it.  And before you ask, the “girl” and “CGA 1k” pictures were not stock conversions, but were hand-pixeled by VileR in Photoshop, and the 4-colors/16-colors/”Until Now” screens in a customized version of Pablodraw he created.) We didn’t have time to put text into this picture, so the people you see above are the same as in credits order:  Trixter, reenigne, Scali, VileR, Phoenix, and virt.  Apologies to coda and puppeh, but as you can see, any more squishing and the faces would have been unrecognizable.  Sorry! mph_screenhots.avi.Still015 Finally, the coup de grâce:  A multichannel music engine for the PC speaker.  We didn’t want to just copy a ZX Spectrum engine, nor other engines such as the one used in Music Construction Set, but rather set the bar impossibly high by playing a protracker mod through the speaker.  Other modplayers for the speaker already exist, but they require a 10 MHz 80286, and can barely manage output at a 6KHz sampling rate.  Ours faithfully reproduces all protracker effects, mixing and outputting to the speaker realtime at 16.5 KHz, all on a 4.77 MHz CPU. This was reenigne’s baby, and is a truly stunning technical achievement that required unconventional thinking and considerable 8088 knowledge to pull off.  I’m sure he will write up a more detailed post on how it was done.  Until then, I can mention the following details:

  • Preconversion of the module was necessary to align data structures and sample data to be favorable to how the 8088 indexes memory.  Sample data is also converted.
  • Each sample must take exactly 288 cycles to calculate and output or else the sound goes completely pants.  This was very difficult to achieve.  4.77 MHz / 288 = 16572 Hz sample output.
  • Audio output was done using traditional Pulse-Width Modulation (PWM) techniques, such as the kind made popular by Access’s Realsound.  PC speaker PWM is performed by tying the PC speaker input pin to the programmable interrupt timer’s (PIT) channel 2, then programming PIT 2 for byte value one-shot mode.  Any value sent to PIT 2 with the system configured like this will set the speaker HIGH and start a count, and when the count expires (ie. the sent value is reached), the speaker goes LOW again.  This results in an audible carrier wave “whine”, which was why the output needed to be fast (16.5 KHz) so that the carrier wave was above the range of human hearing.

Fun fact:  After preconversion of the song and being turned into a self-playing .exe, the final result is smaller after compression than the size of the original source module.

Party Sprint

At the party, we came with something that was 90% finished.  Prior to arriving at the party, we created what we thought was a decent entry, and created two “failsafe” videos, one that was a capture for the bigscreen and another that showed the demo running on real hardware as verification for the judges.  We were worried that the hardware we were bringing would get damaged in transit, so this was a precaution so that we could enter something if that happened.  Thankfully, reenigne’s and Scali’s IBM 5160s arrived unharmed (which was especially remarkable since reenigne had to bring his from the UK to Germany on a series of trains!).  We also brought two CGA cards, and two capture devices, and three different methods of exchanging new software bits from our laptops to the old hardware.  You can never be too prepared! Most of the coding time at the party was spent adding the kefrens and ending portrait picture, eliminating bugs from each part where possible, adding nice transitions where possible, shaving seconds off of each part to stay within the compo limit, and rearranging parts so that virt’s BTTF-inspired tune’s intro lined up with the sprite part.  We spent pretty much all our time before the compo coding, eating, or visiting the bathroom, and only had time to socialize after that. While we came mostly prepared for something that was worthy of entering the compo, the time spent at the party was invaluable for turning a rough draft into something that could really compete for first place.  Having all four of us at the same table meant we could collaborate instantly.  So, lesson learned:  There are rarely substitutes for working together in person!  One of the biggest improvements of “party collaborating” was the decision to change the credits from a variable-speed, text-only scrolling to a more evenly-paced, ANSI-style scrolling, which I think was the best implementation change compared to the bits we brought from home. To help save time (and to ensure the video was converted well — sorry, but most people don’t know how to deal with interlaced video properly), I offered to provide Gasman with a 720@60p video.  The NTSC output of CGA is slightly off; instead of 262.5 lines per field, it generates 262.  This means it generates 59.92 fields (29.96 frames) per second instead of the NTSC broadcast standard of 59.94 (29.97 fps).  This throws off most modern capture devices; Scali had access to a high-quality Blackmagic Intensity Shuttle, for example, but it couldn’t lock onto the signal.  I knew from experience that some cheap video capture devices, such as the Terratec Grabby or the Dazzle DVC100, have extra tolerance built into them as they were designed to be used with VCR sources, so I bought a few and sent one to reenigne for testing.  For the capture, we used a DVC100 with some slight proc amp adjustments so that the capture looked as close to the CRT monitor output as possible.  To further ensure better video capturing, we used VirtualDub for the capture software, which has an option to dynamically resample the input audio source to fit the capture framerate you are aiming for in case it’s slightly off, and the software and hardware combination worked very well.  For grabbing the audio, we initially tapped the speaker with alligator clips, but Scali brought his Sound Blaster which had a real PC speaker tap you could hook up with an internal cable, so we used that for the final capture.

Looking to the future

After watching the demo and reading the above, you may be wondering if there is actually room for improvement.  Believe it or not, there is:  Alternative methods of sound generation and additional cycle-exact trickery are definitely possible.  We had more effects to put into the demo, but ran out of time:  We ran out of development time, and we also ran out of execution time, as the Revision compo limit was 8 minutes or less. I’ve known everyone who has worked on the demo collectively over 60 years.  It was an honor and a privilege to work with them all to produce this demo.  Will we work together again?  I’d say it’s definitely possible; the day after the compo, we threw around some ideas, such as making a game next instead of a demo.  Me personally, I’m burnt out and will be spending the next few weeks playing some games I’ve always wanted to finish, and working on my health.  I also have some other large projects I want to get kickstarted this summer, such as something the PC software preservation movement desperately needs, and an online sound card museum.  But hey, who knows.

Blogosphere Coverage and Discussions

75 Responses to “8088 MPH: We Break All Your Emulators”

  1. Thank you for taking the time to write this. I really look forward to my next opportunity to sit down with you and ask questions in person. OUTSTANDING scene spirit, man!

  2. […] https://trixter.oldskool.org/2015/04/07/8088-mph-we-break-all-your-emulators/ […]

  3. […] https://trixter.oldskool.org/2015/04/07/8088-mph-we-break-all-your-emulators/ […]

  4. […] 8088 MPH: We Break All Your Emulators « Oldskooler Ramblings. […]

  5. […] Der Artikel auf Oldskool.org […]

  6. Astounding! this made me recall the reason why I started coding in the first place way back in 1983…

  7. reenigne said

    > At the party, reenigne posited that it should be possible to restart the CRTC start address every single
    > scanline. This would result in a video mode that was only 100 lines high, and would give a 80×100
    > resolution 1024-color mode.

    I did, but that’s not how the 100-line mode works. I’ll try to get my writeup finished and posted today.

    > The above is the result of that coding, plus really extensive work done on a CGA NTSC composite
    > signal modeling program done by reenigne months earlier to perform the image conversion. (No, you
    > can’t have it.

    Oh, can’t they have it? I was going to let them have it!

    > (more are they discovered/updated)

    Here are some others I found:
    Reddit: http://www.reddit.com/r/dosgaming/comments/31jsrq/ and http://www.reddit.com/r/hackernews/comments/31ule6/8088_mph_we_break_all_your_emulators/
    Hacker news: https://hn.algolia.com/?query=8088%20mph&sort=byPopularity&prefix&page=0&dateRange=all&type=story

  8. […] I will discuss some of it in more detail at a later time. Trixter has already done a global write-up of the demo. […]

  9. Reblogged this on Virgilio Leonardo Ruilova Castillo.

  10. Superb text. Superb effort. Thanks!

  11. […] 8088 MPH: We Break All Your Emulators […]

  12. mk2k said

    Congratulations for the first place and thanks for this great read!

  13. Congratulations, Trixter! Hope your health is better, long time I don’t hear from you. Greetings from Brazil!

  14. Many thanks for this write-up. Now that I know about your beeper technology, I understand why you did it this way; there is no time to shoot beeps out on ZX Spectrum, so for us simple replayers or complex replayers are much more similar in terms of the cycles needed. Lovely teamwork, huge research, massive demo. Personally, I would love to know more about your modplayer (there are some modplayers on ZX Spectrum too, but all of them are limited in one way or another, to the best of my knowledge).

  15. […] Leonard (AKA Trixter) has a great blog post about how he helped create the amazing 8088 MPH classic IBM PC demo that won the 2015 Revision demoparty’s ‘oldschool’ category.  Jim has a long […]

  16. […] Bolide. Der hätte den Ur-PC von 1981 locker an die Wand gerechnet. Um so erstaunlicher finde ich diese Demo: 8088 MPH. Unglaublich, was die aus so einer alten Kiste herauskitzeln! Klar, auf C64 wäre das ein alter […]

  17. […] at once. On Saturday, a team of people including myself, Trixter, Scali and VileR released a demo ("8088 MPH") which smashed this limit and won first place in the "Oldskool Demo" compo at the Revision 2015 […]

  18. Great work! Your next challenge is to write an emulator for it…

  19. accsoleh said

    great, oldskool..

  20. […] Gewinner der Revision 2015 Demoparty in Saarbrücken, 8088 MPH von Hornet + CRTC + DESiRE auf einem 8088-CPU in einem IBM5150 (dem ersten „IBM PC“). What kind of sorcery is […]

  21. […] last 100 seconds of 8088 MPH sound very different to the rest of the demo. The end tune is actually a 4-channel Amiga MOD file […]

  22. […] 8088 MPH by [Hornet], [CRTC], and [DESire], the winner of the recent 2015 Revision Demo compo just turned conventional wisdom on its head. It ran on a 4.77 MHz 8088 CPU – the same found in the original IBM PC. Graphics were provided via composite output by a particular IBM CGA card, and sound was a PC speaker beeper, beeping sixty times a second. Here’s a capture of the video. […]

  23. You mentioned comparison with 65C10 (6502) RISC used on 1Mhz 8bit computers and 3ish Mhz z80 as if the ~5Mhz 8088 was somehow equivalent. Firstly, more clocks or not, what you got done in a simple instruction like “MUL BX” which took a few clock counts would take umteen instructions in the RISC CPUs. The z80 was also CISC (and the 6809 I believe) and both also took about 4 clock counts for anything, even a NOP. So this comparison was invalid. Your CPU speed, and the capability of the CPU and available RAM were _significant_ advantages.

    I’m still impressed this is amazing. However, you should look at what was done on a 1K ZX81 http://www.pouet.net/prod.php?which=19210
    This computer didn’t even have a gfx chip, it literally had to use the CPU to wobble the TV modulator on and off!

    • reenigne said

      The hardware multiply in the 8088 is microcoded and takes between 69 and 154 cycles if the operands are both initially in registers. It’s actually often faster to do it the 6502 way! Our platform does have the advantage of some 16-bit registers and operations, and RAM as you said (but inferior graphics and sound hardware, at least to the C64). So the tricks you can do (and have to do) are very different but the end results are comparable.

      That ZX81 demo is awesome though!

      • Scali said

        Another disadvantage we have with the 8088 is that because it is a 16-bit CPU/instructionset, a lot of our instructions are 2 bytes or more, which means we spent a lot more cycles fetching our instructions from memory.
        It was designed as a 16-bit CPU (the 8086), and putting it on an 8-bit bus severely hampers performance. True 8-bit designs such as the 6502 or the Z80 are more efficient.

    • Trixter said

      For argument’s sake, I removed Z80 from the article text. But the Z80 could read memory faster than the 8088 could.

  24. […] 8088 MPH by [Hornet], [CRTC], and [DESire], the winner of the recent 2015 Revision Demo compo just turned conventional wisdom on its head. It ran on a 4.77 MHz 8088 CPU – the same found in the original IBM PC. Graphics were provided via composite output by a particular IBM CGA card, and sound was a PC speaker beeper, beeping sixty times a second. Here’s a capture of the video. […]

  25. […] “Because there are so many technological world-firsts in the demo, and because we’re bending the hardware in ways that people have never thought to do so, it’s only fair that we try to explain exactly how this was achieved.” […]

  26. […] “Because there are so many technological world-firsts in the demo, and because we’re bending the hardware in ways that people have never thought to do so, it’s only fair that we try to explain exactly how this was achieved.” […]

  27. […] 8088 MPH by [Hornet], [CRTC], and [DESire], the winner of the recent 2015 Revision Demo compo just turned conventional wisdom on its head. It ran on a 4.77 MHz 8088 CPU – the same found in the original IBM PC. Graphics were provided via composite output by a particular IBM CGA card, and sound was a PC speaker beeper, beeping sixty times a second. Here’s a capture of the video. […]

  28. […] 8088 MPH by [Hornet], [CRTC], and [DESire], the winner of the recent 2015 Revision Demo compo just turned conventional wisdom on its head. It ran on a 4.77 MHz 8088 CPU – the same found in the original IBM PC. Graphics were provided via composite output by a particular IBM CGA card, and sound was a PC speaker beeper, beeping sixty times a second. Here’s a capture of the video. […]

  29. […] 8088 MPH by [Hornet], [CRTC], and [DESire], the winner of the recent 2015 Revision Demo compo just turned conventional wisdom on its head. It ran on a 4.77 MHz 8088 CPU – the same found in the original IBM PC. Graphics were provided via composite output by a particular IBM CGA card, and sound was a PC speaker beeper, beeping sixty times a second. Here’s a capture of the video. […]

  30. […] 8088 MPH: We Break All Your Emulators.  (via) […]

  31. […] And Trixter and reenigne have already covered most of the technical details in these articles: https://trixter.oldskool.org/2015/04/07/8088-mph-we-break-all-your-emulators/ http://www.reenigne.org/blog/1k-colours-on-cga-how-its-done/ […]

  32. Really amazing work!!!! Thank you very much for that! I’ve been in the demo scene in the early 90s but would have never thought to see 1K colors on CGA.

  33. catweazle666 said

    Wonderful stuff, thanks!

    Takes me back to the days of messing about with my Spectrum, stuffing stuff into the video ram during the retrace, making the Schmitt trigger audio play polyphonic tunes and other assembly language black magic.

    Happy days!

  34. […] ist einer für die Alten unter euch: Eine Old School Demo. Wie old school? Für den 8088. Den Original IBM-PC. Mit CGA. Von 1981. In Emulatoren läuft es […]

  35. bleuge said

    Coppers! ;)
    If i remember right, the firts vertical coppers i see in a pc (i am not saying they were first, but the first i watched) was in the demos from Majic12/PC, this one https://www.youtube.com/watch?v=TLRTWg7vQR0
    Funny to see they have to cut down music because for them full cpu time was used ;)
    You did the same, but in much much much less powerful machine !

    • Scali said

      We didn’t cut music though :)
      Also, I’m a bit miffed about people calling them copperbars or rasterbars. Yes, there are rasterbars, but they are actually Kefrens bars, since you also have the sinewave thing. Which is a lot more difficult to do than ‘just’ rasterbars.

      • Trixter said

        What about the “Alcatraz bars” reference above?

        • Scali said

          Yes, technically Alcatraz were the first to do the effect, not Kefrens. But this was the pre-internet era, so apparently the Alcatraz demo was not widely known, and the Kefrens one was, hence the effect got named after them.

          But what I meant is: some people seem to only see the rasterbars effect, while the Kefrens/Alcatraz bars are the ‘main’ effect, and the rasterbars are just tacked on there, because there were enough free cycles to do so (as is the music).

          Rasterbars are not that difficult on CGA, and have been done before by Codeblasters in CGADEMO. This is a completely different effect, and works in a completely different way.

  36. […] Aske (25 min) “1024 farver med CGA“ […]

  37. […] before our release of 8088 MPH at Revision 2015, another 8088+CGA production surfaced at Gubbdata […]

  38. There were nearly 100 packers made for DOS, but unless they compress smaller than apack or upx, or decompress faster than pklite or lzexe

  39. […] amazing demo called “8088 MPH” which requires real IBM 5160 PC with CGA. Now I have CGA at home but still not PC/XT […]

  40. […] 2015 timeline… on 8088 MPH: We Break All Your… […]

  41. […] 這段影片使用到的技術在 DosBOX 環境底下仍無法忠實重現,所以播放時會把模擬器弄壞,所以只能在 YouTube […]

  42. […] 以當代技術復興古典硬體:用 IBM PC… on 8088 MPH: We Break All Your… […]

  43. […] 8088 MPH: We Break All Your Emulators 3 by quietcuriosity | 0 comments on Hacker News. […]

  44. […] 8088 MPH: We Break All Your Emulators 3 by quietcuriosity | 0 comments on Hacker News. […]

  45. […] 8088 MPH: We Break All Your Emulators 3 by quietcuriosity | 0 comments on Hacker News. […]

  46. […] Article URL: https://trixter.oldskool.org/2015/04/07/8088-mph-we-break-all-your-emulators/ […]

  47. you should try KA9Q NOS for DOS networking with the packet driver

  48. […] 8088 MPH: We Break All Your Emulators (2015) 118 by quietcuriosity | 8 comments on Hacker News. […]

  49. […] Read Extra […]

  50. Boris Borisov said

    Demo looks amazing. I have a question or two. I’ve run the demo on bastard PC/XT compatible. Packard Bell PB88 with multidisplay video adaptor from IBM because of the composite output. When the big letters text that is scrolling up and down between scenes instead of black screen under the text I have like displaying memory content all random bytes and bits.
    Second thing my default color palette is changed when I play CGA games. Also the digger game won’t display picture at all.

    What registers the demo access in video adaptor? Maybe I can managed to set it default.

    • Trixter said

      What is the exact video adapter you’re using? You wrote “multidisplay video adaptor from IBM” but without knowing exactly what you have, I can’t advise. Based on what you wrote, it sounds like you’re using a VGA adapter, perhaps? The demo was written to only support real IBM CGA, not clones, so this is likely your issue.

      • Boris Borisov said

        Seems to be made by company I cannot find any information of it on the web “Universal Research”. Two PCB full length connected together as sandwich with two 9 pin monitor connectors and composite output and parallel port maybe . I think one stamped number suggest made around 1986. If someone have heard about that company please respond.

  51. Alex Tidmarsh said

    Just to bring up short on a factually incorrect statement made in the article: fitting 640K to the very first IBM PCs wasn’t possible. For the following reasons:
    1. There were only five slots on the 5150 – which were only enough to fit 512K with the IBM adapters available at the time. It took a short while before the third-party builders created single card solutions to get around this. That 512K limit, by the way, was only achievable if you opted not to have any other adapters aside from the FDD controller and a graphics card. Do the math: 64+64+64+64+64+64+64+64 is 7 ISA slots plus the memory on the system board. That’s two slots more than the 5150 had – and that’s not even adding the FDD adapter and graphics card yet. This was the case until 1982/1983, a whole year after the 5150 was released.
    2. Even with a third party card, there was still a limit of 544K. This is because prior to the 1982 BIOS it was assumed the slot-limit would prevent more than 512K being fitted. Thus the BIOS (and indeed PC1.0) would only expect 512K. The BIOS could not allow more because it didn’t read the last DIP switch needed. DOS does what the BIOS tells it, but flakes out at 544K because, again, it wasn’t envisaged as happening.
    Thus it was only in late 1982 that a new BIOS, and new RAM cards for the ISA slots, and a new DOS (PCDOS1.1) made 640K a reality.
    3. Finally, there was at first no need. Most people were happy that 128K could be fitted, and that it was expandable beyond 256K. Trouble was, memory was freakishly expensive! So most units in that first year would have only ever seen between 64K to 256K fitted. In fact IBM sold a lot of units with just 16K, 32K or 48K on them – 48K being the minimum required to run PC-DOS. So that should put things a little more in perspective.

    • reenigne said

      Thank you for keeping us honest! This is actually something that we were aware of (and talked about) in the planning stages of 8088 MPH. However, we decided to stick with the 640kB limit for several reasons. One: it let us do some things that otherwise could not have done (I think the Kefrens Bars effect is the most memory-hungry effect – not sure offhand what the limit is for the 3D effect). Two: technically it is possible without modifying the base hardware (albeit not using stock IBM parts) – in theory, someone could have made a 192kB board (an 8×12 array of 4116s). Three of those is 576kB, plus 64kB on the system board gives you 640kB. Expensive but completely possible with the hardware of the time. True, not all of that would be recognised by the BIOS and DOS… but: 8088 MPH will use the RAM even if the BIOS and DOS don’t know about it!

    • Scali said

      I think it’s a bit of arguing about semantics though… The article merely states that it was possible to add 640k to the original IBM PC 5150. As you say, external manufacturers did build these cards, and BIOS and OS updates became available to make this a reality. It was not a possibility at release time, but the article never claimed this.

      By that same logic, you could argue that 8088 MPH two other deficiencies: the code is only compatible with DOS 2.0 or higher, and it requires a 360k floppy drive. Neither of these were available at release time either, but could be added to a 5150 later.

      However, I suppose the message is really: any 5150 is physically capable of all the audio and video effects in 8088 MPH.
      Firmware, software, memory and storage requirements are things that could have been worked around in software if required.

  52. Trixter said

    To add to reenigne’s reply: The only truly memory-hungry effect is the Kefrens bars; the rest work in much less. The plasma effect in particular only uses 15K of RAM including its own code and all graphics. We could have gotten 8088 MPH running in way less memory — we just chose not to for maximum impact.

  53. […] 8088 MPH: We Break All Your Emulators […]

  54. […] curiosos, además (entre los que nos incluimos, ¡no lo podemos remediar!), el grupo ha subido un más que interesante mensaje a su página oficial donde explican los pormenores del diseño y programación de la demo, deteniéndose incluso en los […]

  55. A'Stanislav Georgiev said

    As I read this:

    4.77 MHz 8088 CPU. 5 MHz seems like a lot compared to other 8-bit micros, but it takes the CPU 4 cycles to read a single byte. So, compared to other 8-bit CPUs like the 6502 or 6809, which can read a byte in one clock cycle, the effective clock speed of the 8088 is more like (4.77 / 4) = 1.19 MHz.

    Can you write MOD player for 6502 and DOS/Prodos (Apple ][, //e). Or using Z80 Softcard – CP/M-80 and Apple ][?

    • reenigne said

      Probably not, at least not at anywhere near that number of channels and sample rate. There are a couple of reasons for this .One is that the 8088 has a much richer and denser instruction set than the 6502 (which is what having more than 8 times as many transistors buys you). In particular, it has 16-bit registers and a 16-bit ALU, which makes manipulating the 16-bit phases for the MOD player massively faster. Using 8-bit phases might be interesting except that 8-bit phases implies 8-bit frequencies which would give something more like the Atari 2600’s incomplete scales. Also complex instructions like “pop [bx]” which (in just two bytes) loads a 16-bit value from the address in SP, stores it to the address in BX, and increments SP by 2. The other reason is that the PC speaker has a PWM timer attached to it which allows output of a “sample” in a single OUT instruction as long as that OUT occurs at a very regular rate. The Apple ][ doesn’t – its speaker is under direct CPU control. PWM is still possible but it requires the CPU doing a loop to keep the speaker pulse high for the correct number of cycles. While is it is doing that it can’t be doing anything else, which means that the actual mixing has to be done while the speaker is silent. That further reduces the sample rate as well as the maximum amplitude. That’s not to say that one can’t make some pretty nice music with this sort of sound hardware and such limited CPUs – have a listen to some of the 1-bit music that has been made for the ZX Spectrum.

      • A'Stanislav Georgiev said

        I agree with you. However there were some music samples recorded via cassette port and played via speaker (or cassette port). The quality was far from perfect but it worked as simple DAC.
        What about Z80 CPU, which some Apple clones have onboard or via Z80 Softcard (2/4 MHz). It is very capable CPU, I wrote many assembler programs in 80s even though I cannot pretend that I am half good as you :)
        BTW, there are wav players, midi players (with converter as your mod player, using Mockingboard). And there is AppleGS, which is very decent machine, capable of almost everything that low end Macs from same ages (and even newer) can do. BTW, there is at least one MOD player – https://resources.openmpt.org/modfaq/3-8.html

        • reenigne said

          It’s a long stretch from being able to play pre-recorded waveforms at a single speed, to being able to mix 4 channels at variable speeds while also outputting audio.
          The Softcard would put the Apple II on par with the ZX Spectrum I think (3.5MHz Z80, 1-bit CPU-controlled speaker output). The 1-bit music made for that platform is extremely well optimised – there’s probably not much room for improvement there (I’ve looked at assembly listings for some of it).
          The Apple IIGS is, to my understanding, a very different and much more capable machine than the Apple II, with a 16-bit CPU and capable sound hardware, so I’m not terribly surprised that MOD players have been written for it. Good to know though!

          • A'Stanislav Georgiev said

            Well, there is Apple II program named Electric Duet, which is basically mixing two channels and play them simultaneously. But they aren’t samples, just beeps, you can check and play some music with an emulator like Applewin.
            Apple IIGS is different, it uses 16 bit CPU, which is still nearly perfect compatible with 6502. Usually 90-98% of Apple II or //e software is natively compatible with GS. GSOS is actually ProDOS-16 with GUI somewhat based on Lisa GUI (historically). MacOS Classic is very close to it, there is something like that for ProDOS-8, named MouseDesk (Apple Desktop) so it is not impossible to port nearly all ProDOS-16 programs to ProDOS-8, especially with CPU accelerator cards working at 8, 16 or even more MHz.
            I am not very familiar with ZX Spectrum, Atari, C64 or Oric Atmos – I used emulators to play with them, because these machines weren’t available in my country, except Oric Atmos clone (Pravetz-8D). Some of these machines are more capable for some tasks than original Apple II.

            • reenigne said

              I suspect Electric Duet uses a pulse engine which is also what many of the ZX Spectrum engines use. There is also a technique to play multiple very simple waves (usually square waves) at once by time-multiplexing them. But in both cases the individual channels are just “beeps” (1-bit square or rectangle waves).

  56. […] was done by me, graphics by VileR of 8088 MPH fame, and the music was done by […]

  57. […] 這段影片使用到的技術在 DOSBox 環境底下仍無法忠實重現,所以播放時會把模擬器弄壞,所以只能在 YouTube […]

  58. […] simple: One should strive to be able to run all software. I have seen various emulator devs dismiss 8088 MPH, because it is the only software of its kind, in how it uses the CGA hardware to generate extra […]

  59. […] perhaps the most impressive piece of IBM PC CGA artifact color out there, the 2015 DOS scenedemo, 8088 MPH by Hornet, CRTC, & Desire, which uses a number of previously unused techniques to achieve 1024 […]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: