Around Christmas I found myself wanting to do some good old Assembly programming, but didn’t really want to work on debugging Bluejay. Instead, I decided to write a little Forth system in 8086 assembly. This turned out to be a great deal more fun than I had anticipated, and it really made me appreciate how well designed 8086 was compared to its future 32 and 64 bit extensions.
I won’t go too deep into technical details here, but there are some neat features of my Forth implementation I wanted to talk about.
The Benefits of Flat Memory
MS-DOS famously used a flat memory map. A COM
program would be loaded starting at 0x100 and run from there, without any fancy memory protections. This obviously has its limitations, but it had some unexpected benefits for my Forth. For those unfamiliar with how Forth works internally, the main data structure is the “dictionary” basically a linked list of “words” (functions or variables) defined in memory. The dictionary grows up (the most recent entry is at the highest memory) like a stack, and you can’t free dictionary entries that aren’t at the very top (so there’s no holes in the stack).
When words are read from the input (the keyboard or a file), one of two things happens, depending on the STATE
of the interpreter:
- In interpret mode, the word is looked up in the dictionary and immediately executed.
- In compile mode, the words “code field” is looked up in the dictionary (think of a function pointer in C), and that code field address is written to memory location specified by
>HERE
(which is incremented after). Basically, instead of evaluating a word, it compiles a call to that word into the word currently being defined on the dictionary.
So every word you define ends up at the top of this contiguous block of memory called the dictionary.
I decided to start my dictionary right where the Assembly code for my system ended. An unexpected benefit of this was that if I just dumped all the memory from 0x100 up to the top of the dictionary to a file, it would be a valid COM
executable with the entire current state of the Forth environment saved!
This meant that without any added effort on my part, I was able save my work in the Forth environment to a real executable and pick up later. And with just some minor modifications I was able to set which word to run on start-up, allowing me to basically compile Forth code to a stand-alone MS-DOS COM
program!
This is actually used to build the real FORTH.COM
binary. First, BASE.COM
is assembled from all the 8086 sources. Then it’s run with CORE.F
as input, producing a new executable which includes the (incomplete) standard library, written in Forth.
I don’t know about you, but this is pretty dang cool to me. Especially considering the actual “compiler” here is only 50 lines of code.
8086 is Really Good™
Writing all this Assembly (actually not that much Assembly, just over 1000 lines clocking in at 2060 bytes total) has definitely been good for me. I finally learned how the x86 string instructions work (and boy are they fun!), and got a chance to do all sorts of register finagling.
One of the most fun parts of working on this though was the constrained environment that MS-DOS 8086 offers. Writing 32 or 64 bit Assembly on a modern Linux machine feels like a chore to be honest. Maybe it’s just paralysis of choice, or maybe I just don’t know enough Assembly, but having a super-constrained instruction set like 8086 and having to carefully choose which registers to keep data in just made everything feel simpler. That sounds a bit counter intuitive, I know, but it’s the truth.
Writing MS-DOS 8086 code felt almost like writing code for a “fantasy computer” like the the Pico-8. A lot of that is probably that I’m too young to have used DOS, but if you think about it a lot of the same characteristics are there: a really simple instruction set, single tasking, and easy IO. Sometimes stepping back to a simpler system, writing code with enough care that you understand exactly why each byte is there, can be a lot of fun.
I’m definitely going to come back to writing 8086 code. If not for DOS, then maybe for another system like CP/M or Xenix. And who knows, maybe my next program will use this new Forth interpreter.