The words create and does> (and its cousin ;code) live at the heart of Forth, and are the source of much of its power. They are one of its engines of extensibility, allowing the programmer to invent new datatypes. Using (and, even more so, implementing) create and does> is part of mastering Forth.

I discuss the following topics:

Using create and does>

Every word (object) in a Forth system has an attached behavior. When you “call” a word, it executes. If it’s a constant, it pushes its value onto the data stack. A variable pushes the address of its value. A code word executes its machine code. A colon word (defined by :) executes the Forth words making up its body. And so on.

A Forth system comes with a small handful of these types already defined. But what if you want to add your own?

First you have to define a word that knows how to define words of your new type. I like to call words that define other words defining words (or, more briefly, definers). Our task is to understand how to define a new definer.

There are two things we need to specify:

Maybe that’s why, in fig-FORTH, these words were called builds and does>. ;-)

In modern systems create makes a new Forth dictionary entry, and does> modifies the new word so that, when executed, it executes the code following the does>. So we come to the first subtlety: the body of a definer is not executed all at once. The part between create and does> is executed when the definer executes; the part after does> executes when the newly-defined word executes.

Generally, after create we put words to build the body of the new definition (its data), and after does>, words that modify or retrieve all or part of its data. (In traditional Forths, when does> executes the “parameter field address” (address of the word’s data) is on the top of the data stack. In muforth an arbitrary constant gets pushed, which could be a PFA.)

A concrete example might help. Let’s write a definer – array – for self-indexing arrays of cells. Here is the code:

  : array  ( n)       create  cells allot
           ( i - a)   does>  swap cells + ;

cells converts a count of cells to a count of bytes; allot allocates the bytes to this word. When does> executes, the address of the first alloted byte is on the stack; we move it out of the way, change the cell index to a byte index, and add it to start of the array, and return this pointer.

Now, lets define an array of pointers to lines of text.

  25 array lines

and to get the address of one of pointers (so that we can use @ or ! to read or write it),

  16 lines

will leave the address of the 17th line. (Remember: 0 lines gives us the first one!)

Note for Niklaus Wirth fans: there is no bounds checking! Of course, this could easily be added.

What about ;code? When is it used?

;code is a performance hack. For words that need to execute blindingly fast, their runtime part can be specified in machine code. By using ;code in place of does>, and by writing assembler code after ;code instead of Forth code, it’s possible to write a definer that builds words with “pure code” runtimes. That’s the only difference between does> and ;code.

History & features of their implementation

First, let’s think about what create and does> actually do when they execute. create is fairly simple: it makes a new entry into the Forth dictionary, consuming a token from the input stream for the name, and sets the code field (of the new word) to point to code that embodies a “default behavior” – perhaps pushing the address of the parameter field.

does> is the interesting creature. It has to modify the newly-created word, whose body (data structure) we have now built, so that when it executes it executes the Forth code in the body of its definer! Let’s use our example above again. When lines executes, it executes the code after does> in array. How does it do this?

Forth is traditionally implemented using a technique called indirect threaded code. Every word in the dictionary has a “code pointer” that points to machine code that should be executed when that word executes. For a code word, this machine code is its own body. Forth colon words are made up of a lists of pointers to other words’ code fields. In order to execute a colon word, we run something called the inner interpreter, which takes each word pointer, in turn, and jumps to its code pointer. This means we need some kind of pointer to keep track of where we are in the list, and if we call another colon word, we have to save our pointer (on the “return stack”), set up the pointer for the called word, and go through its list. Hmm... This is sounding like an interpreter for a virtual machine! Well, it is.

And this explains the the machine code every colon word points to, often called “docolon”, “do_colon”, or “nest”: it pushes the IP (interpretation pointer, our pointer into the list of pointers) onto the return stack, sets IP to point to its own list, and then executes the next word... which happens to be the first word in its own body. The list of words ends with the word exit – also called unnest or ;s – which pops the old IP off the return stack, and executes the “next” word...which will now be the word, in the original caller, after the word that invoked the colon word. It’s simply call and return.

Back to lines. What can we put into its code field so that, when executed, it will execute code in array? The simplest solution – though not the nicest – is to put a pointer to “dodoes” (aka do_does) into the code field, followed by a pointer to the list of words following does> in array. The do_does machine code does the following:

polyFORTH (at least initially) and fig-FORTH implemented this scheme. It was later realized that a clever hack was possible: by prefixing the list of words (following does> in array) with a bit of machine code – namely, a call to do_does – it was possible to do away with the IP pointer in lines. Now its machine code pointer does double duty: it points to machine code (the call in array), but it also points to Forth code (because immediately after the call to do_does is a list of words to execute). It’s quite clever, elegant, and confusing.

And impossible to implement in a pure threaded Forth, since the code that compiles does> needs to “know” how to compile a procedure call. So in the new threaded muforth (which is written in C, and hopefully portable) does> is implemented in the old, suboptimal, fig-FORTH way. And it works fine. ;-)

NOTES:

Meta-compiling with create and does>

Maybe later. ;-)