Maze Generation In Thirteen Bytes
Posted by Trixter on December 17, 2012
Update 12/7/2012 @ 13:46: Peter Ferrie smashed my record by a single byte, so the record is now held by him at 12 bytes. Congrats, and I feel like a fool for missing it :-) I’ve tacked on his optimization to the end of my original post.
Update 1/7/2013: herm1t further smashed the record, down to 11 bytes!
Update 1/7/2013 @ 18:00: Peter bounces back and reclaims the record with 10 bytes! It kind-of breaks my target platform (uses an undocumented opcode that only works on Intel processors) but hey, a record’s a record! I’ve updated the article below.
In the past, when I’ve had a democoding breakthrough, I kept quiet and either used my discovery in a production, or just bragged to my demoscene friends privately. However, my opportunities to achieve democoding “world firsts” are just about gone, and size coding compos seem to be dead, so I’ve decided to just write a blog post about what I’ve done instead: I’ve written a maze generator in only 13 bytes of x86 machine code.
To understand my 13b achievement, you need to be familiar with 10 PRINT. 10 PRINT is the name given to a single short line of Commodore 64 BASIC code that, when run, appears to generate a never-ending complicated maze. The output isn’t really a true maze — it has no proper beginning and end, and some avenues lead nowhere — but it looks like one because our brains are the most powerful pattern recognition engines on the planet. The appreciation for 10 PRINT comes from the seemingly infinite complexity produced from a very simple single line of code. For you non-technical people, consider 10 PRINT the programming equivalent of a haiku.
10 PRINT is also the name of a 324-page book dedicated to the code, its output, and 300 other pages that explore code, art, perception, culture, randomness, mazes, and almost anything else the authors thought of when pondering the code. (The book is legally free for download, although you can purchase a copy to benefit the non-profit organizations PLAYPOWER and The MIT Press, so if the first hundred pages interest you, buy the book to help them out.) It is hundreds of pages more than I would ever write about fascinating minutiae and borders on stonergazing, but the authors have done such a thorough job and made the subject material accessible to so many people that it’s really hard for me to criticize them. I recommend you give it a read, as there’s something in there for everyone and the presentation of the material is top-notch.
Anyway, back to my code. I was reading the 10 PRINT book and got to page 242 where it explains that the BASIC code can be represented even smaller as C64 machine code, and challenges the reader to improve on it. Sure enough, checking 4-Mat’s and Wisdom’s attempts, the smallest possible version clocks in at 15 bytes, which is pretty amazing. For kicks, I wondered what an x86 equivalent would look like, so I wrote up an x86 version of 10 PRINT. Here’s the easy-to-follow, friendly attempt:
; Simple program to duplicate "10 PRINT" output, also known as DataGlyphs. ; This version "plays nice" by grabbing from a "true" random data source ; and not making any assumptions about register contents. ; We also change to a low-resolution mode to have a friendlier aspect ratio ; and exit cleanly once the "maze" is done printing. init: mov ax,0001h int 10h ;force 40x25 color xor bh,bh ;we just set video, so set vidpage=0 mov cx,(40*23) ;number of chars to write getrnd: xor ax,ax ;to get al=0 pushf cli ;disable interrupts, timer read must be atomic out 43h,al ;latch count register in al,40h ;read lobyte of count mov ah,al in al,40h ;read hibyte of count popf ;enable interrupts xchg ah,al ;fixup; now contains 8253 raw 16-bit word pickch: shr ax,1 ;BIOSes init in count-by-2, 1st bit always lo shr ax,1 ;carry = "random" bit mov al,'\' jc writec ;if bit set, write immediately mov al,'/' ;otherwise, choose other character writec: mov ah,0eh ;write char in teletype mode int 10h loop getrnd int 20h ;exit back to DOS
This code is well-behaved; it doesn’t make assumptions about the state of the processor, and it exits cleanly (even though the original 10 PRINT code loops forever). It also manages to come up with a halfway decent random number, taken from the hardware timer which counts from 65535 down to 0 every second. Compiled with a86, it results in a 42-byte .COM file. The size suffers compared to the C64 version in two major places: The random number generation, and picking the character. In picking the character, the C64 has an advantage in that both slashes are right next to each other in PETSCII values, so if you have a random bit, you just add it to the first slash and you’ll get either no change, or the second slash. No such luck on the PC; the ASCII values for both slashes are far apart, so I use a conditional branch that picks one or the other based on the random bit.
Because I’m used to optimizing for speed, I decided to eliminate the branch where the slash character is chosen to see what it looked like (branches are a speed penalty on most CPUs). Despite being far away in value, the bit patterns for both slashes were only different by a single bit once you shifted one of them:
"/"=2f and "\"=5c 2f=00101111 5c=01011100
This led to replacing the pick-a-character chunk “pickch” with this:
pickch: shr ax,1 ;BIOSes init in count-by-2, 1st bit always low push cx ;Save our loop counter for later mov cl,al ;copy random value to cl and cl,1 ;cl is now either 0 or 1 mov al,'\' ;seed character as '\' shr al,cl or al,cl ;if cl=1, '\' is now '/', otherwise no change
While pleased with myself, this increased the size to 48 bytes, so I discarded the idea.
Another optimization thought was to change the character print routine (“writec”) to something that just stuck bytes directly into video memory. That shaved two bytes off the print routine, but it took 4 bytes to set up (5 bytes if I wanted to stay compatible with the 8086), so it was a net loss that also got discarded.
I then turned my attention to the biggest chunk in the code, the random number generator. Running the program a few times, it was clear that the quality of randomness didn’t need to be very high; just enough to avoid obvious runs or patterns. This meant I could get my random data from almost anywhere, so I replaced the entire “getrnd” block with a single instruction: LODSB. This retrieves a byte from memory and then advances position, so if you call it repeatedly you get a sequence of bytes from memory, one after the other. The sequence I got back was from wherever DS:SI points to on program startup; for a DOS .COM file, it points to the current program code. So my “random” stream of bytes was actually the compiled code itself (ha!) plus whatever remnants of prior programs were still left in memory. This led to a massive reduction down to 25 bytes.
When I saw 25 bytes, I got excited because it was now in the realm of possibility to catch up to the C64 version. There was still a lot of fluff in the program, so I took advantage of more “safe” startup assumptions and jettisoned nice-but-unnecessary things like changing the video mode, a clean exit (it loops forever, just like the original), and initializing some registers. In the end, it looked like this:
init: mov ah,0eh ;prime 10h for "write char in teletype mode" getrnd: lodsb ;read a byte from wherever DS:SI points at pickch: shr al,1 ;carry = "random" bit mov al,'\' jc writec ;if bit set, write immediately mov al,'/' ;otherwise, choose other character writec: int 10h ;write the character jmp getrnd ;loop endlessly
This compiled to 15 bytes. Success!
I’d matched the C64 version, but looking at the “pickch” block I kept feeling like I could improve it. That block starts by setting the carry flag to a random value, but the 8086 has a parity flag that is automatically set by some operations, and you can jump based on its state. Unfortunately, LODSB does not set any flags. Any math operation will set the parity flag, but most math operations would take up additional space and rob me of space savings. If only there was a single-byte instruction that could do what LODSB does and set the parity flag based on the input…
Turns out, there is! SCAS is an instruction that loads a byte and then compares it with AL, and then sets flags based on the comparison. Like LODSB, it advances through memory, and it was meant to be used in a REPeat loop — but there’s nothing preventing you from using it outside of a loop. So I did, and here is the final result:
; Simple program to duplicate "10 PRINT" output, sometimes called DataGlyphs. ; Written by Trixter / Hornet, firstname.lastname@example.org, 20121214 ; For more information about "10 PRINT", consult http://10print.org init: mov ah,0eh ;prime 10h for "write char in teletype mode" getrnd: scasb ;read from ES:DI points and compare to AL ;this sets flags similar to a subtraction pickch: mov al,'\' ;choose a character jp writec ;if parity bit set, write immediately mov al,'/' ;otherwise, choose other character writec: int 10h ;write the character jmp getrnd ;loop endlessly
And there you have it: 13 bytes. As an added bonus, it works on an old IBM PC too, since it doesn’t use any 80186+ instructions. Sample output below (if system is booted in 40-column mode, just like a C64): (click on the picture directly if your browser isn’t displaying it properly)
I believe this is as small as it can get on x86, courtesy of Intel CISC. I also think this is a 10 PRINT size world record for any platform, but if I’m wrong, please let me know.
As the update above shows, Peter Ferrie was able to improve my design by a single byte. His code is as follows, with my annotations:
init: mov ax,0e5ch ;Load AH with "write char" cmd and AL with '\' scasb ;read from where ES:DI points and compare to AL ;this sets flags similar to a subtraction pickch: jp writec ;If parity set, jump to writing char in AL mov al,’/’ ;otherwise, choose other slash char writec: int 10h ;write char in AL jmp init ;loop endlessly
Bravo, sir. But even further than that, using DOS’s “write character” int 29h, herm1t got it down to:
init: mov al, '\' scasb jp writec mov al, '/' writec: int 29h jmp init
Clever. At the time I wrote the original 13-byte version, I’d been working on self-booting code with no OS, so I wasn’t considering DOS at any point. My loss is herm1t’s gain!
But wait, that’s not all:
init: scasb salc and al,'\'-'/' add al,'/' int 29h jmp init
Peter bounces back with another byte shaved! (if you run Intel CPUs — trying to run this on a non-intel CPU such as an NEC V20 or V30 will hang the machine).