Maze Generation In Thirteen Bytes
Posted by Trixter on December 17, 2012
Update 12/7/2012 @ 13:46: Peter Ferrie smashed my record by a single byte, so the record is now held by him at 12 bytes. Congrats, and I feel like a fool for missing it :-) I’ve tacked on his optimization to the end of my original post.
Update 1/7/2013: herm1t further smashed the record, down to 11 bytes!
Update 1/7/2013 @ 18:00: Peter bounces back and reclaims the record with 10 bytes! It kind-of breaks my target platform (uses an undocumented opcode that only works on Intel processors) but hey, a record’s a record! I’ve updated the article below.
In the past, when I’ve had a democoding breakthrough, I kept quiet and either used my discovery in a production, or just bragged to my demoscene friends privately. However, my opportunities to achieve democoding “world firsts” are just about gone, and size coding compos seem to be dead, so I’ve decided to just write a blog post about what I’ve done instead: I’ve written a maze generator in only 13 bytes of x86 machine code.
To understand my 13b achievement, you need to be familiar with 10 PRINT. 10 PRINT is the name given to a single short line of Commodore 64 BASIC code that, when run, appears to generate a never-ending complicated maze. The output isn’t really a true maze — it has no proper beginning and end, and some avenues lead nowhere — but it looks like one because our brains are the most powerful pattern recognition engines on the planet. The appreciation for 10 PRINT comes from the seemingly infinite complexity produced from a very simple single line of code. For you non-technical people, consider 10 PRINT the programming equivalent of a haiku.
10 PRINT is also the name of a 324-page book dedicated to the code, its output, and 300 other pages that explore code, art, perception, culture, randomness, mazes, and almost anything else the authors thought of when pondering the code. (The book is legally free for download, although you can purchase a copy to benefit the non-profit organizations PLAYPOWER and The MIT Press, so if the first hundred pages interest you, buy the book to help them out.) It is hundreds of pages more than I would ever write about fascinating minutiae and borders on stonergazing, but the authors have done such a thorough job and made the subject material accessible to so many people that it’s really hard for me to criticize them. I recommend you give it a read, as there’s something in there for everyone and the presentation of the material is top-notch.
42 bytes
Anyway, back to my code. I was reading the 10 PRINT book and got to page 242 where it explains that the BASIC code can be represented even smaller as C64 machine code, and challenges the reader to improve on it. Sure enough, checking 4-Mat’s and Wisdom’s attempts, the smallest possible version clocks in at 15 bytes, which is pretty amazing. For kicks, I wondered what an x86 equivalent would look like, so I wrote up an x86 version of 10 PRINT. Here’s the easy-to-follow, friendly attempt:
; Simple program to duplicate "10 PRINT" output, also known as DataGlyphs. ; This version "plays nice" by grabbing from a "true" random data source ; and not making any assumptions about register contents. ; We also change to a low-resolution mode to have a friendlier aspect ratio ; and exit cleanly once the "maze" is done printing. init: mov ax,0001h int 10h ;force 40x25 color xor bh,bh ;we just set video, so set vidpage=0 mov cx,(40*23) ;number of chars to write getrnd: xor ax,ax ;to get al=0 pushf cli ;disable interrupts, timer read must be atomic out 43h,al ;latch count register in al,40h ;read lobyte of count mov ah,al in al,40h ;read hibyte of count popf ;enable interrupts xchg ah,al ;fixup; now contains 8253 raw 16-bit word pickch: shr ax,1 ;BIOSes init in count-by-2, 1st bit always lo shr ax,1 ;carry = "random" bit mov al,'\' jc writec ;if bit set, write immediately mov al,'/' ;otherwise, choose other character writec: mov ah,0eh ;write char in teletype mode int 10h loop getrnd int 20h ;exit back to DOS
This code is well-behaved; it doesn’t make assumptions about the state of the processor, and it exits cleanly (even though the original 10 PRINT code loops forever). It also manages to come up with a halfway decent random number, taken from the hardware timer which counts from 65535 down to 0 every second. Compiled with a86, it results in a 42-byte .COM file. The size suffers compared to the C64 version in two major places: The random number generation, and picking the character. In picking the character, the C64 has an advantage in that both slashes are right next to each other in PETSCII values, so if you have a random bit, you just add it to the first slash and you’ll get either no change, or the second slash. No such luck on the PC; the ASCII values for both slashes are far apart, so I use a conditional branch that picks one or the other based on the random bit.
Because I’m used to optimizing for speed, I decided to eliminate the branch where the slash character is chosen to see what it looked like (branches are a speed penalty on most CPUs). Despite being far away in value, the bit patterns for both slashes were only different by a single bit once you shifted one of them:
"/"=2f and "\"=5c 2f=00101111 5c=01011100
This led to replacing the pick-a-character chunk “pickch” with this:
pickch: shr ax,1 ;BIOSes init in count-by-2, 1st bit always low push cx ;Save our loop counter for later mov cl,al ;copy random value to cl and cl,1 ;cl is now either 0 or 1 mov al,'\' ;seed character as '\' shr al,cl or al,cl ;if cl=1, '\' is now '/', otherwise no change
While pleased with myself, this increased the size to 48 bytes, so I discarded the idea.
Another optimization thought was to change the character print routine (“writec”) to something that just stuck bytes directly into video memory. That shaved two bytes off the print routine, but it took 4 bytes to set up (5 bytes if I wanted to stay compatible with the 8086), so it was a net loss that also got discarded.
25 bytes
I then turned my attention to the biggest chunk in the code, the random number generator. Running the program a few times, it was clear that the quality of randomness didn’t need to be very high; just enough to avoid obvious runs or patterns. This meant I could get my random data from almost anywhere, so I replaced the entire “getrnd” block with a single instruction: LODSB. This retrieves a byte from memory and then advances position, so if you call it repeatedly you get a sequence of bytes from memory, one after the other. The sequence I got back was from wherever DS:SI points to on program startup; for a DOS .COM file, it points to the current program code. So my “random” stream of bytes was actually the compiled code itself (ha!) plus whatever remnants of prior programs were still left in memory. This led to a massive reduction down to 25 bytes.
15 bytes
When I saw 25 bytes, I got excited because it was now in the realm of possibility to catch up to the C64 version. There was still a lot of fluff in the program, so I took advantage of more “safe” startup assumptions and jettisoned nice-but-unnecessary things like changing the video mode, a clean exit (it loops forever, just like the original), and initializing some registers. In the end, it looked like this:
init: mov ah,0eh ;prime 10h for "write char in teletype mode" getrnd: lodsb ;read a byte from wherever DS:SI points at pickch: shr al,1 ;carry = "random" bit mov al,'\' jc writec ;if bit set, write immediately mov al,'/' ;otherwise, choose other character writec: int 10h ;write the character jmp getrnd ;loop endlessly
This compiled to 15 bytes. Success!
13 bytes
I’d matched the C64 version, but looking at the “pickch” block I kept feeling like I could improve it. That block starts by setting the carry flag to a random value, but the 8086 has a parity flag that is automatically set by some operations, and you can jump based on its state. Unfortunately, LODSB does not set any flags. Any math operation will set the parity flag, but most math operations would take up additional space and rob me of space savings. If only there was a single-byte instruction that could do what LODSB does and set the parity flag based on the input…
Turns out, there is! SCAS is an instruction that loads a byte and then compares it with AL, and then sets flags based on the comparison. Like LODSB, it advances through memory, and it was meant to be used in a REPeat loop — but there’s nothing preventing you from using it outside of a loop. So I did, and here is the final result:
; Simple program to duplicate "10 PRINT" output, sometimes called DataGlyphs. ; Written by Trixter / Hornet, trixter@oldskool.org, 20121214 ; For more information about "10 PRINT", consult http://10print.org init: mov ah,0eh ;prime 10h for "write char in teletype mode" getrnd: scasb ;read from ES:DI points and compare to AL ;this sets flags similar to a subtraction pickch: mov al,'\' ;choose a character jp writec ;if parity bit set, write immediately mov al,'/' ;otherwise, choose other character writec: int 10h ;write the character jmp getrnd ;loop endlessly
And there you have it: 13 bytes. As an added bonus, it works on an old IBM PC too, since it doesn’t use any 80186+ instructions. Sample output below (if system is booted in 40-column mode, just like a C64): (click on the picture directly if your browser isn’t displaying it properly)
I believe this is as small as it can get on x86, courtesy of Intel CISC. I also think this is a 10 PRINT size world record for any platform, but if I’m wrong, please let me know.
12 bytes
As the update above shows, Peter Ferrie was able to improve my design by a single byte. His code is as follows, with my annotations:
init: mov ax,0e5ch ;Load AH with "write char" cmd and AL with '\' scasb ;read from where ES:DI points and compare to AL ;this sets flags similar to a subtraction pickch: jp writec ;If parity set, jump to writing char in AL mov al,’/’ ;otherwise, choose other slash char writec: int 10h ;write char in AL jmp init ;loop endlessly
Bravo, sir. But even further than that, using DOS’s “write character” int 29h, herm1t got it down to:
11 bytes
init: mov al, '\' scasb jp writec mov al, '/' writec: int 29h jmp init
Clever. At the time I wrote the original 13-byte version, I’d been working on self-booting code with no OS, so I wasn’t considering DOS at any point. My loss is herm1t’s gain!
But wait, that’s not all:
10 bytes
init: scasb salc and al,'\'-'/' add al,'/' int 29h jmp init
Peter bounces back with another byte shaved! (if you run Intel CPUs — trying to run this on a non-intel CPU such as an NEC V20 or V30 will hang the machine).
Peter Ferrie said
12.
init: mov ax,0e5ch
scasb
pickch: jp writec
mov al,’/’
writec: int 10h
jmp init
Trixter said
Of course!!! (smacks head repeatedly)
You have the new record! Congratulations sir!
phoenix said
If there’s anything to be learned about size coding, never ever say “I believe this is as small as it can get”. :)
phoenix said
Does this work better on a real DOS PC than elsewhere? In a Win XP box, it gives mostly \ chars with flashes of random. In DOSBox, the random flashes are even less frequent.
Trixter said
Because the code segment is used as the random data source, it does not look great if you boot and then immediately run the program. Run any other program first, preferably a large one, to fill the first 128K of RAM with junk.
Peter Ferrie said
“read from DS:SI points” -> “read from where ES:DI points”
Trixter said
I added the “where” but I left DS:SI alone — SCASB doesn’t read from ES:DI.
Peter Ferrie said
Intel docs: “SCAS m8: Compare AL with byte at ES:(E)DI or RDI, then set status flags”.
Trixter said
I’m Doing It Wrong! Shows you how much I use SCAS in my code. Fixed.
holograph said
Can’t blame you, semantically speaking you’d definitely expect SCAS to use DS:SI… surprising!
Bruno said
I haven’t had this much fun reading a post in a long, long time :)
ange albertini said
herm1t chopped another byte from the “AH = 0xE + Int 10h” part by using the ‘undocumented’ int 29h.
cf http://lj.rossia.org/users/herm1t/66490.html
Peter Ferrie said
“10 Print” in 10 bytes!!
init:scasb
salc
and al,’\’-‘/’
add al,’/’
int 29h
jmp init
Trixter said
AMAZIN– hey, wait a minute, that uses an undocumented opcode that doesn’t work on all processors.
So I guess all three of us have our own record — mine for working on any machine, herm1t for working on any machine running DOS, and you for working on any Intel CPU. We all winnerz! \o/
Trixter said
The more I look at what you did, the more I am a fan :-) I updated the article with your code.
Andrey Baranovich said
Cool! :-)
Peter Ferrie said
Thanks.
It should work on all Intel, AMD, and Cyrix processors.
Intel introduced the instruction, AMD documented the instruction, and Cyrix copied the instruction.
Trixter said
I guarantee you it does not work on NEC V20 and V30 CPUs. D6 on those CPUs is a duplicate of XLATB which is D7. More background and tests: http://www.vintage-computer.com/vcforum/showthread.php?28949-The-D6-instruction-on-the-NEC-V-series-(V20-V30)-chips
Good to know it works on everything else. BTW I don’t consider this a “bug” in the NEC chips; it wasn’t documented when they were being designed.
Post Position » 10 PRINT Marches on said
[…] Trixter provides the full story (so far!) on his blog, Oldskooler Ramblings. […]
Gloriosas inutilidades: gerando um labirinto em 10 (sim, DEZ) bytes! | Retrocomputaria Plus said
[…] os recordes vão se sucedendo: 25 bytes, 15,13, 12. 11… dez! A essa altura o povo já estava apelando para truques que não são coisa de Jesus, tais como […]
huseyink said
There is a 13 bytes C64 version, that was released on 4th of December, 2012:
http://csdb.dk/release/?id=113300
In the comments section, there is also a 12 bytes version, which was created cooperatively after exchanging some ideas with other C64 fellows.
As these versions were released before this article was written, only 11 and 10 bytes versions have actually broken the record. :-)
As for the PC versions, they are pretty impressive, congratulations guys!
kenorb said
Bash implementation: http://codegolf.stackexchange.com/a/26011/11164
sedatk said
Here is my CPC version (11 bytes): http://www.pouet.net/prod.php?which=65273
Source code: https://github.com/ssg/mazecpc
Hell Mood said
I just created an 8 byte version =)
http://www.pouet.net/prod.php?which=65290
Trixter said
Lol, nice use of adjoining characters :)
Hell Mood said
Yep, as i posted internally for our group : “Didn’t really take much asm magic to get to 8 byte, was more like ‘finding’ something which all of them must have overlooked ;) ” I am kind of tempted to pull of some “real magic”, like i did here http://www.pouet.net/prod.php?which=63126 , but currently i’m not in the right mood … some of the intrinsic energy seems to be lost if a 2^n border is reached ;)
Trixter said
To be fair, while it does follow the “spirit” of the challenge, it does not follow it to the “letter”. Meaning, / and \ are supposed to be the only characters used. But, I still applaud it (8 bytes!!) as it was clever to find chars 90 degrees apart visually while only 1 char apart in ASCII. Nice job!
www.theknot.com said
http://www.theknot.com
Maze Generation In Thirteen Bytes « Oldskooler Ramblings
Kaproncai Tamas said
Here it is… another 10 bytes version
by TomCat/Abaddon:
start:
LODSB
CBW
MOV AL,’/’
AAD 256-‘\’+’/’
INT 29H
LOOP start