Oldskooler Ramblings

the unlikely child born of the home computer wars

Maze Generation In Thirteen Bytes

Posted by Jim Leonard on December 17, 2012


Update 12/7/2012 @ 13:46: Peter Ferrie smashed my record by a single byte, so the record is now held by him at 12 bytes.  Congrats, and I feel like a fool for missing it :-)  I’ve tacked on his optimization to the end of my original post.

Update 1/7/2013: herm1t further smashed the record, down to 11 bytes!

Update 1/7/2013 @ 18:00: Peter bounces back and reclaims the record with 10 bytes! It kind-of breaks my target platform (uses an undocumented opcode that only works on Intel processors) but hey, a record’s a record! I’ve updated the article below.

In the past, when I’ve had a democoding breakthrough, I kept quiet and either used my discovery in a production, or just bragged to my demoscene friends privately.  However, my opportunities to achieve democoding “world firsts” are just about gone, and size coding compos seem to be dead, so I’ve decided to just write a blog post about what I’ve done instead: I’ve written a maze generator in only 13 bytes of x86 machine code.

To understand my 13b achievement, you need to be familiar with 10 PRINT.  10 PRINT is the name given to a single short line of Commodore 64 BASIC code that, when run, appears to generate a never-ending complicated maze.  The output isn’t really a true maze — it has no proper beginning and end, and some avenues lead nowhere — but it looks like one because our brains are the most powerful pattern recognition engines on the planet.  The appreciation for 10 PRINT comes from the seemingly infinite complexity produced from a very simple single line of code.  For you non-technical people, consider 10 PRINT the programming equivalent of a haiku.

10 PRINT is also the name of a 324-page book dedicated to the code, its output, and 300 other pages that explore code, art, perception, culture, randomness, mazes, and almost anything else the authors thought of when pondering the code.  (The book is legally free for download, although you can purchase a copy to benefit the non-profit organizations PLAYPOWER and The MIT Press, so if the first hundred pages interest you, buy the book to help them out.)  It is hundreds of pages more than I would ever write about fascinating minutiae and borders on stonergazing, but the authors have done such a thorough job and made the subject material accessible to so many people that it’s really hard for me to criticize them.  I recommend you give it a read, as there’s something in there for everyone and the presentation of the material is top-notch.

42 bytes

Anyway, back to my code.  I was reading the 10 PRINT book and got to page 242 where it explains that the BASIC code can be represented even smaller as C64 machine code, and challenges the reader to improve on it.  Sure enough, checking 4-Mat’s and Wisdom’s attempts, the smallest possible version clocks in at 15 bytes, which is pretty amazing.  For kicks, I wondered what an x86 equivalent would look like, so I wrote up an x86 version of 10 PRINT.  Here’s the easy-to-follow, friendly attempt:

; Simple program to duplicate "10 PRINT" output, also known as DataGlyphs.
; This version "plays nice" by grabbing from a "true" random data source
; and not making any assumptions about register contents.
; We also change to a low-resolution mode to have a friendlier aspect ratio
; and exit cleanly once the "maze" is done printing.

init:   mov     ax,0001h
        int     10h             ;force 40x25 color
        xor     bh,bh           ;we just set video, so set vidpage=0
        mov     cx,(40*23)      ;number of chars to write

getrnd: xor     ax,ax           ;to get al=0
        pushf
        cli                     ;disable interrupts, timer read must be atomic
        out     43h,al          ;latch count register
        in      al,40h          ;read lobyte of count
        mov     ah,al
        in      al,40h          ;read hibyte of count
        popf                    ;enable interrupts
        xchg    ah,al           ;fixup; now contains 8253 raw 16-bit word

pickch: shr     ax,1            ;BIOSes init in count-by-2, 1st bit always lo
        shr     ax,1            ;carry = "random" bit
        mov     al,'\'
        jc      writec          ;if bit set, write immediately
        mov     al,'/'          ;otherwise, choose other character

writec: mov     ah,0eh          ;write char in teletype mode
        int     10h
        loop    getrnd

        int     20h             ;exit back to DOS

This code is well-behaved; it doesn’t make assumptions about the state of the processor, and it exits cleanly (even though the original 10 PRINT code loops forever).  It also manages to come up with a halfway decent random number, taken from the hardware timer which counts from 65535 down to 0 every second.  Compiled with a86, it results in a 42-byte .COM file.  The size suffers compared to the C64 version in two major places: The random number generation, and picking the character.  In picking the character, the C64 has an advantage in that both slashes are right next to each other in PETSCII values, so if you have a random bit, you just add it to the first slash and you’ll get either no change, or the second slash.  No such luck on the PC; the ASCII values for both slashes are far apart, so I use a conditional branch that picks one or the other based on the random bit.

Because I’m used to optimizing for speed, I decided to eliminate the branch where the slash character is chosen to see what it looked like (branches are a speed penalty on most CPUs).  Despite being far away in value, the bit patterns for both slashes were only different by a single bit once you shifted one of them:

"/"=2f and "\"=5c
2f=00101111
5c=01011100

This led to replacing the pick-a-character chunk “pickch” with this:

pickch: shr     ax,1            ;BIOSes init in count-by-2, 1st bit always low
        push    cx              ;Save our loop counter for later
        mov     cl,al           ;copy random value to cl
        and     cl,1            ;cl is now either 0 or 1
        mov     al,'\'          ;seed character as '\'
        shr     al,cl
        or      al,cl           ;if cl=1, '\' is now '/', otherwise no change

While pleased with myself, this increased the size to 48 bytes, so I discarded the idea.

Another optimization thought was to change the character print routine (“writec”) to something that just stuck bytes directly into video memory.  That shaved two bytes off the print routine, but it took 4 bytes to set up (5 bytes if I wanted to stay compatible with the 8086), so it was a net loss that also got discarded.

25 bytes

I then turned my attention to the biggest chunk in the code, the random number generator.  Running the program a few times, it was clear that the quality of randomness didn’t need to be very high; just enough to avoid obvious runs or patterns.  This meant I could get my random data from almost anywhere, so I replaced the entire “getrnd” block with a single instruction: LODSB.  This retrieves a byte from memory and then advances position, so if you call it repeatedly you get a sequence of bytes from memory, one after the other.  The sequence I got back was from wherever DS:SI points to on program startup; for a DOS .COM file, it points to the current program code.  So my “random” stream of bytes was actually the compiled code itself (ha!) plus whatever remnants of prior programs were still left in memory.  This led to a massive reduction down to 25 bytes.

15 bytes

When I saw 25 bytes, I got excited because it was now in the realm of possibility to catch up to the C64 version.  There was still a lot of fluff in the program, so I took advantage of more “safe” startup assumptions and jettisoned nice-but-unnecessary things like changing the video mode, a clean exit (it loops forever, just like the original), and initializing some registers.  In the end, it looked like this:

init:   mov     ah,0eh          ;prime 10h for "write char in teletype mode"

getrnd: lodsb                   ;read a byte from wherever DS:SI points at

pickch: shr     al,1            ;carry = "random" bit
        mov     al,'\'
        jc      writec          ;if bit set, write immediately
        mov     al,'/'          ;otherwise, choose other character

writec: int     10h             ;write the character
        jmp     getrnd          ;loop endlessly

This compiled to 15 bytes.  Success!

13 bytes

I’d matched the C64 version, but looking at the “pickch” block I kept feeling like I could improve it.  That block starts by setting the carry flag to a random value, but the 8086 has a parity flag that is automatically set by some operations, and you can jump based on its state.  Unfortunately, LODSB does not set any flags.  Any math operation will set the parity flag, but most math operations would take up additional space and rob me of space savings.  If only there was a single-byte instruction that could do what LODSB does and set the parity flag based on the input…

Turns out, there is!  SCAS is an instruction that loads a byte and then compares it with AL, and then sets flags based on the comparison.  Like LODSB, it advances through memory, and it was meant to be used in a REPeat loop — but there’s nothing preventing you from using it outside of a loop.  So I did, and here is the final result:

; Simple program to duplicate "10 PRINT" output, sometimes called DataGlyphs.
; Written by Trixter / Hornet, trixter@oldskool.org, 20121214
; For more information about "10 PRINT", consult http://10print.org

init:   mov     ah,0eh          ;prime 10h for "write char in teletype mode"

getrnd: scasb                   ;read from ES:DI points and compare to AL
                                ;this sets flags similar to a subtraction

pickch: mov     al,'\'          ;choose a character
        jp      writec          ;if parity bit set, write immediately
        mov     al,'/'          ;otherwise, choose other character

writec: int     10h             ;write the character
        jmp     getrnd          ;loop endlessly

And there you have it:  13 bytes.  As an added bonus, it works on an old IBM PC too, since it doesn’t use any 80186+ instructions.  Sample output below (if system is booted in 40-column mode, just like a C64):  (click on the picture directly if your browser isn’t displaying it properly)

10print6_002

I believe this is as small as it can get on x86, courtesy of Intel CISC.  I also think this is a 10 PRINT size world record for any platform, but if I’m wrong, please let me know.

12 bytes

As the update above shows, Peter Ferrie was able to improve my design by a single byte.  His code is as follows, with my annotations:

init:   mov     ax,0e5ch        ;Load AH with "write char" cmd and AL with '\'
        scasb                   ;read from where ES:DI points and compare to AL
                                ;this sets flags similar to a subtraction
pickch: jp      writec          ;If parity set, jump to writing char in AL
        mov     al,’/’          ;otherwise, choose other slash char
writec: int     10h             ;write char in AL
        jmp     init            ;loop endlessly

Bravo, sir. But even further than that, using DOS’s “write character” int 29h, herm1t got it down to:

11 bytes

init:   mov     al, '\'
        scasb
        jp      writec
        mov     al, '/'
writec:	int     29h
        jmp     init

Clever. At the time I wrote the original 13-byte version, I’d been working on self-booting code with no OS, so I wasn’t considering DOS at any point. My loss is herm1t’s gain!

But wait, that’s not all:

10 bytes

init:   scasb
        salc
        and al,'\'-'/'
        add al,'/'
        int 29h
        jmp init

Peter bounces back with another byte shaved! (if you run Intel CPUs — trying to run this on a non-intel CPU such as an NEC V20 or V30 will hang the machine).

About these ads

20 Responses to “Maze Generation In Thirteen Bytes”

  1. 12.
    init: mov ax,0e5ch
    scasb
    pickch: jp writec
    mov al,’/’
    writec: int 10h
    jmp init

  2. phoenix said

    Does this work better on a real DOS PC than elsewhere? In a Win XP box, it gives mostly \ chars with flashes of random. In DOSBox, the random flashes are even less frequent.

    • Trixter said

      Because the code segment is used as the random data source, it does not look great if you boot and then immediately run the program. Run any other program first, preferably a large one, to fill the first 128K of RAM with junk.

  3. “read from DS:SI points” -> “read from where ES:DI points”

  4. Bruno said

    I haven’t had this much fun reading a post in a long, long time :)

  5. herm1t chopped another byte from the “AH = 0xE + Int 10h” part by using the ‘undocumented’ int 29h.

    cf http://lj.rossia.org/users/herm1t/66490.html

  6. “10 Print” in 10 bytes!!

    init:scasb
    salc
    and al,’\’-‘/’
    add al,’/’
    int 29h
    jmp init

  7. [...] Trixter provides the full story (so far!) on his blog, Oldskooler Ramblings. [...]

  8. [...] os recordes vão se sucedendo: 25 bytes, 15,13, 12. 11… dez! A essa altura o povo já estava apelando para truques que não são coisa de Jesus, tais como [...]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 335 other followers

%d bloggers like this: