December 2012
S	M	T	W	T	F	S
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Maze Generation In Thirteen Bytes

Posted by Trixter on December 17, 2012

Update 12/7/2012 @ 13:46: Peter Ferrie smashed my record by a single byte, so the record is now held by him at 12 bytes. Congrats, and I feel like a fool for missing it :-) I’ve tacked on his optimization to the end of my original post.

Update 1/7/2013: herm1t further smashed the record, down to 11 bytes!

Update 1/7/2013 @ 18:00: Peter bounces back and reclaims the record with 10 bytes! It kind-of breaks my target platform (uses an undocumented opcode that only works on Intel processors) but hey, a record’s a record! I’ve updated the article below.

In the past, when I’ve had a democoding breakthrough, I kept quiet and either used my discovery in a production, or just bragged to my demoscene friends privately. However, my opportunities to achieve democoding “world firsts” are just about gone, and size coding compos seem to be dead, so I’ve decided to just write a blog post about what I’ve done instead: I’ve written a maze generator in only 13 bytes of x86 machine code.

To understand my 13b achievement, you need to be familiar with 10 PRINT. 10 PRINT is the name given to a single short line of Commodore 64 BASIC code that, when run, appears to generate a never-ending complicated maze. The output isn’t really a true maze — it has no proper beginning and end, and some avenues lead nowhere — but it looks like one because our brains are the most powerful pattern recognition engines on the planet. The appreciation for 10 PRINT comes from the seemingly infinite complexity produced from a very simple single line of code. For you non-technical people, consider 10 PRINT the programming equivalent of a haiku.

10 PRINT is also the name of a 324-page book dedicated to the code, its output, and 300 other pages that explore code, art, perception, culture, randomness, mazes, and almost anything else the authors thought of when pondering the code. (The book is legally free for download, although you can purchase a copy to benefit the non-profit organizations PLAYPOWER and The MIT Press, so if the first hundred pages interest you, buy the book to help them out.) It is hundreds of pages more than I would ever write about fascinating minutiae and borders on stonergazing, but the authors have done such a thorough job and made the subject material accessible to so many people that it’s really hard for me to criticize them. I recommend you give it a read, as there’s something in there for everyone and the presentation of the material is top-notch.

42 bytes

Anyway, back to my code. I was reading the 10 PRINT book and got to page 242 where it explains that the BASIC code can be represented even smaller as C64 machine code, and challenges the reader to improve on it. Sure enough, checking 4-Mat’s and Wisdom’s attempts, the smallest possible version clocks in at 15 bytes, which is pretty amazing. For kicks, I wondered what an x86 equivalent would look like, so I wrote up an x86 version of 10 PRINT. Here’s the easy-to-follow, friendly attempt:

; Simple program to duplicate "10 PRINT" output, also known as DataGlyphs.
; This version "plays nice" by grabbing from a "true" random data source
; and not making any assumptions about register contents.
; We also change to a low-resolution mode to have a friendlier aspect ratio
; and exit cleanly once the "maze" is done printing.

init:   mov     ax,0001h
        int     10h             ;force 40x25 color
        xor     bh,bh           ;we just set video, so set vidpage=0
        mov     cx,(40*23)      ;number of chars to write

getrnd: xor     ax,ax           ;to get al=0
        pushf
        cli                     ;disable interrupts, timer read must be atomic
        out     43h,al          ;latch count register
        in      al,40h          ;read lobyte of count
        mov     ah,al
        in      al,40h          ;read hibyte of count
        popf                    ;enable interrupts
        xchg    ah,al           ;fixup; now contains 8253 raw 16-bit word

pickch: shr     ax,1            ;BIOSes init in count-by-2, 1st bit always lo
        shr     ax,1            ;carry = "random" bit
        mov     al,'\'
        jc      writec          ;if bit set, write immediately
        mov     al,'/'          ;otherwise, choose other character

writec: mov     ah,0eh          ;write char in teletype mode
        int     10h
        loop    getrnd

        int     20h             ;exit back to DOS

This code is well-behaved; it doesn’t make assumptions about the state of the processor, and it exits cleanly (even though the original 10 PRINT code loops forever). It also manages to come up with a halfway decent random number, taken from the hardware timer which counts from 65535 down to 0 every second. Compiled with a86, it results in a 42-byte .COM file. The size suffers compared to the C64 version in two major places: The random number generation, and picking the character. In picking the character, the C64 has an advantage in that both slashes are right next to each other in PETSCII values, so if you have a random bit, you just add it to the first slash and you’ll get either no change, or the second slash. No such luck on the PC; the ASCII values for both slashes are far apart, so I use a conditional branch that picks one or the other based on the random bit.

Because I’m used to optimizing for speed, I decided to eliminate the branch where the slash character is chosen to see what it looked like (branches are a speed penalty on most CPUs). Despite being far away in value, the bit patterns for both slashes were only different by a single bit once you shifted one of them:

"/"=2f and "\"=5c
2f=00101111
5c=01011100

This led to replacing the pick-a-character chunk “pickch” with this:

pickch: shr     ax,1            ;BIOSes init in count-by-2, 1st bit always low
        push    cx              ;Save our loop counter for later
        mov     cl,al           ;copy random value to cl
        and     cl,1            ;cl is now either 0 or 1
        mov     al,'\'          ;seed character as '\'
        shr     al,cl
        or      al,cl           ;if cl=1, '\' is now '/', otherwise no change

While pleased with myself, this increased the size to 48 bytes, so I discarded the idea.

Another optimization thought was to change the character print routine (“writec”) to something that just stuck bytes directly into video memory. That shaved two bytes off the print routine, but it took 4 bytes to set up (5 bytes if I wanted to stay compatible with the 8086), so it was a net loss that also got discarded.

25 bytes

I then turned my attention to the biggest chunk in the code, the random number generator. Running the program a few times, it was clear that the quality of randomness didn’t need to be very high; just enough to avoid obvious runs or patterns. This meant I could get my random data from almost anywhere, so I replaced the entire “getrnd” block with a single instruction: LODSB. This retrieves a byte from memory and then advances position, so if you call it repeatedly you get a sequence of bytes from memory, one after the other. The sequence I got back was from wherever DS:SI points to on program startup; for a DOS .COM file, it points to the current program code. So my “random” stream of bytes was actually the compiled code itself (ha!) plus whatever remnants of prior programs were still left in memory. This led to a massive reduction down to 25 bytes.

15 bytes

When I saw 25 bytes, I got excited because it was now in the realm of possibility to catch up to the C64 version. There was still a lot of fluff in the program, so I took advantage of more “safe” startup assumptions and jettisoned nice-but-unnecessary things like changing the video mode, a clean exit (it loops forever, just like the original), and initializing some registers. In the end, it looked like this:

init:   mov     ah,0eh          ;prime 10h for "write char in teletype mode"

getrnd: lodsb                   ;read a byte from wherever DS:SI points at

pickch: shr     al,1            ;carry = "random" bit
        mov     al,'\'
        jc      writec          ;if bit set, write immediately
        mov     al,'/'          ;otherwise, choose other character

writec: int     10h             ;write the character
        jmp     getrnd          ;loop endlessly

This compiled to 15 bytes. Success!

13 bytes

I’d matched the C64 version, but looking at the “pickch” block I kept feeling like I could improve it. That block starts by setting the carry flag to a random value, but the 8086 has a parity flag that is automatically set by some operations, and you can jump based on its state. Unfortunately, LODSB does not set any flags. Any math operation will set the parity flag, but most math operations would take up additional space and rob me of space savings. If only there was a single-byte instruction that could do what LODSB does and set the parity flag based on the input…

Turns out, there is! SCAS is an instruction that loads a byte and then compares it with AL, and then sets flags based on the comparison. Like LODSB, it advances through memory, and it was meant to be used in a REPeat loop — but there’s nothing preventing you from using it outside of a loop. So I did, and here is the final result:

; Simple program to duplicate "10 PRINT" output, sometimes called DataGlyphs.
; Written by Trixter / Hornet, trixter@oldskool.org, 20121214
; For more information about "10 PRINT", consult http://10print.org

init:   mov     ah,0eh          ;prime 10h for "write char in teletype mode"

getrnd: scasb                   ;read from ES:DI points and compare to AL
                                ;this sets flags similar to a subtraction

pickch: mov     al,'\'          ;choose a character
        jp      writec          ;if parity bit set, write immediately
        mov     al,'/'          ;otherwise, choose other character

writec: int     10h             ;write the character
        jmp     getrnd          ;loop endlessly

And there you have it: 13 bytes. As an added bonus, it works on an old IBM PC too, since it doesn’t use any 80186+ instructions. Sample output below (if system is booted in 40-column mode, just like a C64): (click on the picture directly if your browser isn’t displaying it properly)

~~I believe this is as small as it can get on x86, courtesy of Intel CISC. I also think this is a 10 PRINT size world record for any platform, but if I’m wrong, please let me know.~~

12 bytes

As the update above shows, Peter Ferrie was able to improve my design by a single byte. His code is as follows, with my annotations:

init:   mov     ax,0e5ch        ;Load AH with "write char" cmd and AL with '\'
        scasb                   ;read from where ES:DI points and compare to AL
                                ;this sets flags similar to a subtraction
pickch: jp      writec          ;If parity set, jump to writing char in AL
        mov     al,’/’          ;otherwise, choose other slash char
writec: int     10h             ;write char in AL
        jmp     init            ;loop endlessly

Bravo, sir. But even further than that, using DOS’s “write character” int 29h, herm1t got it down to:

11 bytes

init:   mov     al, '\'
        scasb
        jp      writec
        mov     al, '/'
writec:	int     29h
        jmp     init

Clever. At the time I wrote the original 13-byte version, I’d been working on self-booting code with no OS, so I wasn’t considering DOS at any point. My loss is herm1t’s gain!

But wait, that’s not all:

10 bytes

init:   scasb
        salc
        and al,'\'-'/'
        add al,'/'
        int 29h
        jmp init

Peter bounces back with another byte shaved! (if you run Intel CPUs — trying to run this on a non-intel CPU such as an NEC V20 or V30 will hang the machine).

This entry was posted on December 17, 2012 at 10:42 am and is filed under Demoscene, Programming, Vintage Computing. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

30 Responses to “Maze Generation In Thirteen Bytes”

Peter Ferrie said

December 17, 2012 at 12:44 pm
12.
init: mov ax,0e5ch
scasb
pickch: jp writec
mov al,’/’
writec: int 10h
jmp init

Reply
- Trixter said
  
  December 17, 2012 at 12:50 pm
  Of course!!! (smacks head repeatedly)
  
  You have the new record! Congratulations sir!
  
  Reply
  - phoenix said
    
    December 17, 2012 at 1:11 pm
    If there’s anything to be learned about size coding, never ever say “I believe this is as small as it can get”. :)
    
    Reply
phoenix said

December 17, 2012 at 1:28 pm
Does this work better on a real DOS PC than elsewhere? In a Win XP box, it gives mostly \ chars with flashes of random. In DOSBox, the random flashes are even less frequent.

Reply
- Trixter said
  
  December 17, 2012 at 1:40 pm
  Because the code segment is used as the random data source, it does not look great if you boot and then immediately run the program. Run any other program first, preferably a large one, to fill the first 128K of RAM with junk.
  
  Reply
Peter Ferrie said

December 17, 2012 at 5:33 pm
“read from DS:SI points” -> “read from where ES:DI points”

Reply
- Trixter said
  
  December 17, 2012 at 6:25 pm
  I added the “where” but I left DS:SI alone — SCASB doesn’t read from ES:DI.
  
  Reply
  - Peter Ferrie said
    
    December 17, 2012 at 10:59 pm
    Intel docs: “SCAS m8: Compare AL with byte at ES:(E)DI or RDI, then set status flags”.
    
    Reply
    - Trixter said
      
      December 18, 2012 at 9:08 am
      I’m Doing It Wrong! Shows you how much I use SCAS in my code. Fixed.
      
      Reply
      - holograph said
        
        December 21, 2012 at 9:00 am
        Can’t blame you, semantically speaking you’d definitely expect SCAS to use DS:SI… surprising!
        
        Reply
Bruno said

January 3, 2013 at 6:02 pm
I haven’t had this much fun reading a post in a long, long time :)

Reply
ange albertini said

January 7, 2013 at 5:30 am
herm1t chopped another byte from the “AH = 0xE + Int 10h” part by using the ‘undocumented’ int 29h.

cf http://lj.rossia.org/users/herm1t/66490.html

Reply
Peter Ferrie said

January 7, 2013 at 4:05 pm
“10 Print” in 10 bytes!!

init:scasb
salc
and al,’\’-‘/’
add al,’/’
int 29h
jmp init

Reply
- Trixter said
  
  January 7, 2013 at 6:36 pm
  AMAZIN– hey, wait a minute, that uses an undocumented opcode that doesn’t work on all processors.
  
  So I guess all three of us have our own record — mine for working on any machine, herm1t for working on any machine running DOS, and you for working on any Intel CPU. We all winnerz! \o/
  
  Reply
  - Trixter said
    
    January 7, 2013 at 6:48 pm
    The more I look at what you did, the more I am a fan :-) I updated the article with your code.
    
    Reply
- Andrey Baranovich said
  
  January 8, 2013 at 3:52 am
  Cool! :-)
  
  Reply
  - Peter Ferrie said
    
    January 8, 2013 at 11:38 am
    Thanks.
    It should work on all Intel, AMD, and Cyrix processors.
    Intel introduced the instruction, AMD documented the instruction, and Cyrix copied the instruction.
    
    Reply
    - Trixter said
      
      January 8, 2013 at 12:07 pm
      I guarantee you it does not work on NEC V20 and V30 CPUs. D6 on those CPUs is a duplicate of XLATB which is D7. More background and tests: http://www.vintage-computer.com/vcforum/showthread.php?28949-The-D6-instruction-on-the-NEC-V-series-(V20-V30)-chips
      
      Good to know it works on everything else. BTW I don’t consider this a “bug” in the NEC chips; it wasn’t documented when they were being designed.
      
      Reply
Post Position » 10 PRINT Marches on said

January 7, 2013 at 10:04 pm
[…] Trixter provides the full story (so far!) on his blog, Oldskooler Ramblings. […]

Reply
Gloriosas inutilidades: gerando um labirinto em 10 (sim, DEZ) bytes! | Retrocomputaria Plus said

March 4, 2013 at 6:26 am
[…] os recordes vão se sucedendo: 25 bytes, 15,13, 12. 11… dez! A essa altura o povo já estava apelando para truques que não são coisa de Jesus, tais como […]

Reply
huseyink said

January 30, 2015 at 12:00 am
There is a 13 bytes C64 version, that was released on 4th of December, 2012:

http://csdb.dk/release/?id=113300

In the comments section, there is also a 12 bytes version, which was created cooperatively after exchanging some ideas with other C64 fellows.

As these versions were released before this article was written, only 11 and 10 bytes versions have actually broken the record. :-)

As for the PC versions, they are pretty impressive, congratulations guys!

Reply
kenorb said

February 19, 2015 at 4:10 am
Bash implementation: http://codegolf.stackexchange.com/a/26011/11164

Reply
sedatk said

March 25, 2015 at 5:53 am
Here is my CPC version (11 bytes): http://www.pouet.net/prod.php?which=65273
Source code: https://github.com/ssg/mazecpc

Reply
Hell Mood said

March 29, 2015 at 9:14 am
I just created an 8 byte version =)
http://www.pouet.net/prod.php?which=65290

Reply
- Trixter said
  
  April 2, 2015 at 4:06 am
  Lol, nice use of adjoining characters :)
  
  Reply
  - Hell Mood said
    
    April 2, 2015 at 11:56 am
    Yep, as i posted internally for our group : “Didn’t really take much asm magic to get to 8 byte, was more like ‘finding’ something which all of them must have overlooked ;) ” I am kind of tempted to pull of some “real magic”, like i did here http://www.pouet.net/prod.php?which=63126 , but currently i’m not in the right mood … some of the intrinsic energy seems to be lost if a 2^n border is reached ;)
    
    Reply
    - Trixter said
      
      April 7, 2015 at 11:16 am
      To be fair, while it does follow the “spirit” of the challenge, it does not follow it to the “letter”. Meaning, / and \ are supposed to be the only characters used. But, I still applaud it (8 bytes!!) as it was clever to find chars 90 degrees apart visually while only 1 char apart in ASCII. Nice job!
      
      Reply
www.theknot.com said

October 11, 2015 at 12:57 pm
http://www.theknot.com

Maze Generation In Thirteen Bytes « Oldskooler Ramblings

Reply
Kaproncai Tamas said

May 24, 2017 at 9:44 am
Here it is… another 10 bytes version
by TomCat/Abaddon:

start:
LODSB
CBW
MOV AL,’/’
AAD 256-‘\’+’/’
INT 29H
LOOP start

Reply
Gloriosas inutilidades: gerando um labirinto em 10 (sim, DEZ) bytes! | Retrópolis said

September 1, 2023 at 11:58 am
[…] os recordes vão se sucedendo: 25 bytes, 15,13, 12. 11… dez! A essa altura o povo já estava apelando para truques que não são coisa de Jesus, tais como […]

Reply

	The Incredible Demo… on 8088 MPH: We Break All Your…
	Trixter on 8088 MPH: We Break All Your…
	wh0phd on 8088 MPH: We Break All Your…
	John Olson on Cyberpunx
	malvineous on VCF Midwest 2024 Presentation…

Oldskooler Ramblings

the unlikely child born of the home computer wars

Recent Posts

Recent Comments

Pages

Meta

Top Posts

Archives

Blog Stats

Maze Generation In Thirteen Bytes

42 bytes

25 bytes

15 bytes

13 bytes

12 bytes

11 bytes

10 bytes

Share this:

Related

30 Responses to “Maze Generation In Thirteen Bytes”

Peter Ferrie said

Trixter said

phoenix said

phoenix said

Trixter said

Peter Ferrie said

Trixter said

Peter Ferrie said

Trixter said

holograph said

Bruno said

ange albertini said

Peter Ferrie said

Trixter said

Trixter said

Andrey Baranovich said

Peter Ferrie said

Trixter said

Post Position » 10 PRINT Marches on said

Gloriosas inutilidades: gerando um labirinto em 10 (sim, DEZ) bytes! | Retrocomputaria Plus said

huseyink said

kenorb said

sedatk said

Hell Mood said

Trixter said

Hell Mood said

Trixter said

www.theknot.com said

Kaproncai Tamas said

Gloriosas inutilidades: gerando um labirinto em 10 (sim, DEZ) bytes! | Retrópolis said

Leave a reply to Post Position » 10 PRINT Marches on Cancel reply