(This is primarily a diary about Michael Pruemm's xiForth project. The “other” voice in the conversation is David Frech.)

2005 March 09 15:35.

It’s alive! Well, I was able to execute (and define!) the first words. Just “make forth”, then run it with input from test.xi4.

OK, enough for tonight. I hope I’ll find a comment or two in the morning...


2005 March 08 22:10.

The name of my Forth is xiForth. I like the lower-case Greek later xi ξ , and David has a Greek letter prefix, too. So now we have ξForth and μForth.

The bootstrap compiler

To keep the compiler as simple as possible I decided to use only a bare minimum of features. There are no “immediate” words that execute at runtime. Unfortunately that means I can’t have IF/THEN etc. And no comments. Instead, everything is treated as an ordinary Forth word whose address is simply “compiled” into the code.

I probably won’t do inline numbers, either. It should be possible to get by with only a few constants.

The bootstrap compiler is entered by : (colon) as usual. It creates a new word and then compiles everything until the end of the line.

Aha. This answers my question, “how do you end a colon definition if you don’t have an immediate ;?”

The only supported control structures are conditional return (?EXIT) and a jump (REPEAT).

?EXIT. The name comes from EXIT, the traditional name for a return from the middle of a word. I may change that to ?^ which looks nice, too.

I borrowed from Smalltalk the name ^ for EXIT, and have never looked back.

REPEAT. The main use of the jump is for looping back to the beginning of the current word, hence the name. But it is really a jump to the beginning of the word following the REPEAT. Here is an example:

  : SPACES ( n)   ?DUP 0= ?EXIT  BL EMIT  REPEAT SPACES ;

The extra return compiled by ; at the end of the word is not really required, but that’s not the point here.

In ITC Forths I am familiar with, words like REPEAT compile an unconditional jump, which resets the Forth interpreter’s IP. I’m not sure what repeat does (in the C code). Is it pulling out the initial IP of the word following and setting IP to that?

Yes, that’s exactly how it works. REPEAT is an unconditional, absolute jump.

Problems. This example demonstrates also the problems and limitations of ?EXIT. There is no easy way to clean up the stack, unless there are several versions of ?EXIT that drop none, one, or two items from the stack. And versions that keep the top of the stack for the following code are useful, too.

I defined 0branch and =0branch as primitives in muforth. This allows me to define if, =if, while, and =while (the only words that compile conditional jumps). The = versions leave TOP untouched; the others consume it. I’m not sure how useful this is in practice.

I still have to define a conditional jump primitive in C. Right now, it is not (yet) required, though.

In this case, though, it becomes more of an issue, since you don’t want to have ten exit points that leave different numbers of items on the stack...

I also cannot decide on whether it should exit on TRUE, or on FALSE. What do you think? Try to write a minimum version of David’s interpreter loop and a CONSUME using just a conditional exit and REPEAT.

  : INTERPRET   TOKEN ... ?EXIT ( if none)  CONSUME  REPEAT INTERPRET ;

where CONSUME tries to find the word and if found compiles the address, otherwise tries to convert it as a number and creates a literal, or generates an error.

Because muforth lacks else, I often write code that looks something like:

   ... <test a> if do_a ^ then  <test b> if do_b ^ then  ...

If you can factor out the do_a and do_b, you can look at this as branching on zero around the return, which would translate to ?^ returning on true.

And that’s how it is defined at the moment. I am toying with the idea of having ?^ consume a 0 but leave any nonzero value there for later use.

In some pseudo-code I wrote, I had use for an ?^^ that exits from the calling word. In a very limited bootstrap scenario it might be useful. But not for general use.

I added an unconditional “leap” – which I call shunt. It exits from the caller’s caller.

Help wanted. Try to bring up a Forth system with a simple compiler like that, at least far enough to replace the compiler by something better.


2005 March 07 10:46.

My own Forth

Last week, David Frech sent me the muforth README file. Since then, I couldn’t stop thinking about Forth. It’s been so bad, that I started to implement my own minimal Forth system, trying out some ideas. As soon as it works at least a little bit, I’ll let you see my sources.

Here are a few of my design decisions:

Small. Even smaller kernel than muforth. Before I started, I did not have a look at muforth, only its Readme. From what I remembered from visiting David, it was bigger than what I had in mind.

I want muforth to be smaller than it is. I failed to reach my goal of “the smallest possible Forth that is still able to continue to compile itself”. I would love to see what you’ve come up with.

Indirect threaded. David wants native code with good cache behaviour. I think that on a modern CPU an indirect threaded Forth can be very fast. Another advantage is that you don’t have to generate any native machine code once the Forth is up and running – as long as you use something like the old Fig builds ... does> approach.

I wanted native code; now, for simplicity and portability I want threaded code, which really means indirect-threaded code, because otherwise (ie, for direct-threaded) I have to know how to compile little “heading” bits into each word as it is being defined. Indirect-threaded replaces this code with an architecture-neutral pointer to code.

Of course this means that “code” words can only be defined by the C compiler – all extensions have to be via defining words and create/does>. And implementing create/does> in the “ideal” modern way is impossible without architecture-specific code, since the ideal way – having child words’ code fields point to code inside the parent (defining) word – requires that we be able to compile native code into the parent. So create/does> has to be done the “fig” way: with two pointers. One to “do_does” (defined in C) and one to the list of Forth words that make up the body of the does> part (in the parent). But this is not the end of the world. ;-)

But we cannot do without create/does>, because without it, Forth lacks lambda – and without lambda, Forth is ... nothing. No better than BASIC.

I think you meant without create/does>, Forth lacks closures?

AFAIC, lambda and closures are isomorphic. Or, to put it another way, lambdas are implemented using closures. Create/does> isn’t quite lambda, but it’s as close as Forth gets, and it is definitely the source of much of its power and expressiveness.

Very simple bootstrap compiler. No immediate words :-)

I don’t exactly have immediate words either, although I have words that act like immediate words. (I call these “compiler” words.) How do you end a word? How do you do “if” and “repeat” and all that? I tried to start with a minimal list of compiler words, but I was foiled by two things: wanting tail-recursion (so "^", which is my EXIT, has to be a compiler word), and native code, which meant that all the R stack words have to be compiler words. This makes my initial list longer than I wanted: eleven words.

I use the term immediate word for the concept of words that execute at compile time, under the control of the “compiler”. In this sense, you do have immediate words, even though the implementation with a separate word chain is way more elegant than the tagging used by FIG Forth.

(I am using this voice when directly addressing any comments instead of extending my original text. --Michael Pruemm)

Yes, I agree that in this sense my compiler words are immediate words. I cannot take credit for the elegant method I employ; it was one of Chuck Moore’s major innovations in cmForth.

I am starting with a portable C implementation that produces actually quite decent code on a PPC.

I’m interested in hearing about your luck with this. My issues with doing any kind of Forth in C are all about execution contexts. To be reasonably efficient, a threaded Forth has to execute Forth words inside of a big C routine – a sort of “execution engine”. If you have to call out to C, or if C can call into Forth, this gets complicated, and you start having deep stacks, full of nested execution contexts. Using setjmp/longjmp for catch/throw makes things worse: now I have to unwind thru Forth execution contexts as well.

I have no intentions of calling Forth from C. The other way around is a different story. I see C here as a portable assembler. But I don’t see why it is necessary “to execute Forth words inside of a big C routine”. My inner interpreter is a loop that calls the other C functions. On the PPC, this is a branch and link, so the return address ends up in a register. Most of the C functions are leaf functions, so they don’t need to save the return address, nor do they require a stack frame, and the return is simply a jump through the link register. That is not different from a traditional jump next at the end of a traditional primitive.

You get lucky with the PPC. Or the ARM. Or MIPS. On x86 things are not so pretty, but with all the modern fancy microarchitecture features the cost of call/return (with a stack push) probably isn’t that much more than branch-and-link.

The C version is intended only as a portable way to the Forth up and running. Efficiency is not the goal here. But if the C compiler generates “good enough” code, why write an assembler kernel?

muforth generates terrible x86 code, but runs tolerably fast. So I agree: in a “bootstrap” Forth, who cares?

Now, because your code is all in one C file, can you actually get gcc to statically allocate registers to you for use by your Forth code? This never worked for me on x86, which is the other reason why I always think of executing inside a big C routine: if I stay inside that routine, gcc will cooperate and give me total use of a register, for, say, the stack pointer. My “native” code generator has to go thru some gyrations to access the stack – gyrations that are neither pretty nor efficient.

I don’t have any problems (so far) to allocate global registers with gcc (on PPC). Haven’t tried it for multiple files, though. But why would that be more problematical? At work, I have two different compilers for PPC, one from Diab which can allocate global registers via a pragma, and one from Green Hills, that cannot.

If the compiler cannot do it, having one big C function is one way to do it, though.

Why do you have so much trouble to access the stack? Are you trying to have the stack “on” the C stack? Or the return stack? My current C version effectively has three stacks: the C stack, and an explicitly managed parameter and return stack for Forth.

muforth has two stacks: the machine’s (which is also the return stack), and an explicitly-managed data stack.

Because I couldn’t convince gcc to put “sp” (the data stack pointer) in a register, I had to keep it in memory, which meant generating code to load it into a register whenever I wanted to use or modify it. This made the very simple compiler somewhat less simple, and made the code much longer and sillier than it could have otherwise been.

The other issue is making sure that you only write code once. Dictionary search, basic “compiler” and interpreter code, should be shareable between Forth and C. How do you do this? If you need a loop (in the bootstrap compiler) do you write it by stringing together addresses of Forth words in an array, and passing that to an execution engine, or do you write it in C and somehow “export” it to Forth?

I don’t care too much about writing code only once. I see the bootstrap compiler only as a vehicle to get the system up (in the C version) and have something more decent to play with. I intend to completely replace it by something written in Forth, without needing a (separate) meta-compiler.

I still haven’t decided how much to write as a C function and what to have in Forth in an initialized array.

I struggled a lot with this last question. In fact, it’s the reason there isn’t, right now, an ITC version of muforth.

Are you planning to meta-compile on every “boot” (ie, execution) of xiForth? Or would it be a once-only (per version) thing? I had originally planned to use muforth as a bootstrap, using its own meta-compiler, but the code was actually quite fast, so I left it alone and instead started thinking about (and writing!) target assemblers and target compilers.

This is still one of the open questions.