Original article appeared in Fourth Dimensions Volume V, Issue 3

Other articles in this series: Laxen meta-compiling one .. Laxen meta-compiling two

Meta Compiling III

Henry Laxen

Last time we talked about how to implement CODE words in the meta-compiler, and saw how such words must operate in order to make meta : definitions work. We also saw how to define a symbol table for the definitions that are created during meta-compilation by using the existing vocabulary structure. We also looked at how to create headers in the target address space. If any of these concepts are unfamiliar to you, I suggest you reread the previous two articles in this series, which discuss them in detail.

I would now like to talk about a few of the subtle issues that come up during meta-compilation that must be handled by some means or another. Some of the subtle issues are how to handle forward references, and how immediate words such as ." are handled. Other similar issues arise, but we must leave some questions unanswered so that the reader can experience the joy of discovery.

The issue of forward references during meta-compiling has, for some unknown reason, become almost a religious issue. The regular Forth interpreter treats forward references as an error condition, which has its pros and cons. Fortunately, it is almost always possible to write your Forth application in such a way that you can avoid forward references, hence one branch of the religion considers the problem solved, namely, don’t use forward references. Unfortunately, in the meta-compiling process, forward references are unavoidable, and we must develop a technique to handle them. Before I discuss a few solutions, I would like to present my view of the forward reference issue. The use of forward references is not sinful, immoral, illegal, or fattening. It should be discouraged, but not banned. The problem that arises with forward references is that you can get yourself into big trouble. It destroys the bottom-up nature of Forth, and can cause you to retest previously working words because they make use of a forward reference which has changed. It also decreases the usefulness of program listings, if you are never sure of which way to turn the pages when you encounter an unfamiliar word.

Forward references also complicate the compiler, since it now must handle another class of objects (other than previously defined words and numbers). Most threatening, however, is that if forward references are abused you can wind up with totally undecipherable spaghetti code. Just look at almost any Fortran program larger than 100 lines and written by a physicist, and you will see what I mean. The case for forward references is that sometimes you must have them. For example, if you are using recursion, and word A calls word B which calls word A, I am afraid a forward reference is somewhat unavoidable; if recursion is the natural solution to your problem it would be silly not to use it. Also, error conditions are often more easily handled if forward references are allowed. You will often want a fatal error, which could occur at a relatively low level in your program, to call, say, the main menu routine, which obviously occurs at a very high level. That is the case in Forth, where ABORT, which is used at a very low level, calls the Forth interpreter, which is defined at a very high level.

Enough religion, let’s take a look at some techniques for handling forward references during meta-compilation. The simplest method to implement (and the hardest to control) is to simply use a place-holding word, and then patch it later when the resolving forward reference word is defined. Normally this word is called GAP( and it behaves like a comment, skipping the following text until the next ) and simply compiling a zero into the target image. The intervening text is usually the name of the word which will later be patched into this location. The problem with this approach is that you have to index some number of bytes, depending on the location of the gap, into the word that was defined, and patch it with another value. This approach is very inflexible and error prone, since if you ever change the definition in which the gap occurs, you must also change the place where it gets patched in a corresponding manner. There is no intelligence required, just conscientious effort, something humans are not well equipped for.

Another approach is to explicitly declare a forward reference before it takes place, and then resolve it somehow later when its target address is known. This is the Pascal approach, and is a pretty good compromise. At least you no longer have to count bytes into a word and hot patch it later. You can simply name the forward referenced word and define a mechanism that resolves it. This approach also allows you to have multiple forward references by linking them into a chain and resolving the entire chain once the target address is known.

Finally, the last approach I will mention is that of handling forward references on the fly. I do not mean to imply that there are only three ways of doing this; there are many more, but three is enough for now. In order to handle forward references on the fly, we must modify the meta-compiler’s compiler. Instead of issuing an error message when an undefined word turns out not to be a number, we must define the word in question and remember the fact that it is a forward reference. Basically, all this entails is to change the compile loop to decide upon one of three cases instead of only two. Case one is that the word to be compiled already exists, in which case we simply compile it by executing it and letting it compile itself. Case two is that the word is a number in the current base, in which case we compile the code field for literal, followed by the value of the literal. Case three is that the word to be compiled is not already defined and is not a number, hence it must be a forward reference. In this case we must create an entry for it in the symbol table of forward references, compile a gap in the word currently being defined, and set up the run time of the forward reference to either link itself into a chain if it is not already resolved, or to compile itself if it is already resolved. Thus, forward references become basically transparent, except that they must be resolved somehow. This resolution can either be automatic as the word is actually defined, or explicit, requiring you to issue commands that will cause the resolution. Personally, I prefer the explicit method, since I am afraid of things happening behind my back, and it slightly discourages the use of forward references, which deep in my heart I know is right.

Enough about forward references, let’s talk for a moment about immediate words. Immediate words present a special problem since they must be executed at compile time. They may do arbitrarily crazy things, and must do them in the target environment. For example ['] must look up the next word in the input stream and compile its code field as literal. Another example is ." which must scan the input until another " is encountered, and then compile the runtime address for (.") which may not even be known yet, followed by the count-delimited string that was scanned. The usual mechanism used to implement immediate words is through a new defining word called T: which behaves just like Forth’s : except that the definition it creates is placed in the target vocabulary, or symbol table. As you recall from last time, the main compiling loop looks up words in the symbol table and executes them. Words that are defined by CODE and : are placed in this symbol table, and when executed compile themselves. For example ." would have to first compile the run time for .", namely (."), and then get the string and compile it into the target image. This is totally different behavior from, say, the meta version of DUP, which simply compiles a pointer to the code field of DUP when it is executed. Thus, for each immediate word that passes through the meta-compiling process, we must define a special case compiling word that “does the right thing” in the meta context.

Now I must apologize for not providing any code this time around. The problem is that all of the issues I discussed above are implemented in a very system-dependent manner; hence I would have to make a lot of assumptions about exactly how vocabularies work and how different system details operate. Rather than do that and provide code that would not run on any existing systems, I decided not to provide any code, but simply to discuss some of the remaining concepts involved in meta-compiling. The best way to really learn about meta-compilers is to write one. Hopefully, I have provided you with enough ammunition to attempt such an undertaking. Let me tell you that if you do, you will raise your level of Forth consciousness many levels, and I think it is an exercise well worth the effort.

Next time I will talk about multitasking, an issue many have heard about but few have seen. We will implement a very simple (and slow) high-level multi-tasker and discover its principles of operation. Until then, good luck and may the Forth be with you!


Copyright © 1983 by Henry Laxen. All rights reserved.

Other articles in this series: Laxen meta-compiling one .. Laxen meta-compiling two