This page and this code was written by Randy Thelen. I’m keeping it here for historical interest.


Here is where I will put the source code to the SX52 Forth project I’m working on.

Here we go:

 	TITLE	"XGS Forth"
 	;
 	; Implements a simple Forth for the Extreme Game Station (XGS).
 	; The basic design is to turn the SX52 into a 16bit stack-based
 	; processor.  Then, on top of that, implement Forth.  The
 	; processor that we turn the SX52 into has the following
 	; architecture:
 	;	Two stacks, Data & Return
 	;	Data:	24 Element Data Stack onto which LITeral
 	;		values can be pushed and from which the
 	;		the operands for various operands are
 	;		popped.
 	;	Return:	16 Element Return Stack onto which return
 	;		addresses are pushed when functions are
 	;		called.  A mechanism exists for pushing
 	;		the top Data Stack element onto the Return
 	;		Stack.  And other neat tricks.
 	;	A registered Top Of Stack.  This 16 bit register
 	;		holds the top element of the Data Stack for
 	;		quick access.
 	;	Instruction Buffer.  Because SRAM requires so many
 	;		clock cycles per reference (assuming one must
 	;		reload the complete 17 bit address each access),
 	;		the processor loads 32 bytes from memory and
 	;		stores them into the buffer.  Then, instructions
 	;		are fetched from the instruction buffer as long
 	;		as possible.
 	;	Program Counter.  A 16 bit program counter keeps track
 	;		of where the next instruction will be fetched.
 	;		NOTE: The Program Counter is a byte pointer.
 	;	Status Register.  When I built Mippy, I didn't have one
 	;		of these and it made some operations terribly
 	;		in-efficient.
 	;		http://madscientistroom.org/mippy/
 	;
 	; There are two stacks, named the Data and Return stacks.  These
 	; stack are 16 bits (2 bytes) and stored in the processor's RAM,
 	; not in the SRAM.  That means the stacks have limited size, but
 	; they are very fast.  One could imagine later using a "window"
 	; into SRAM where some amount of the stack would be in the processor's
 	; RAM, but this would be ugly.  So, I do not do that.  Instead, the
 	; stacks are limited and size.
 	;
 	; There is a Top Of data Stack register, also 16 bits, which contains
 	; the current top element of the stack.  There is no analog for the
 	; return stack (yet).  As a result of using a Top Of Stack register,
 	; that means that the stack is never truley empty.  When the stack
 	; memory is empty, the register still contains something.  And, when
 	; the thing in the Top Of Stack register is consumed, some processing
 	; will prevent loading random memory (the word 'above' the stack's
 	; memory but not stack memory) into the stack register.  The reason
 	; we don't want to do this is now this un associated register is
 	; 'copied' into the Top Of Stack register.  When a value is pushed
 	; onto the stack, then the value that had been copied into the Top
 	; Of Stack register will be written back.  This write back operation
 	; could contaminate the memory by writing back an old value.
 	; 
 	; There are 64 bytes for 32 user variables.  These will be -much- faster
 	; than variables stored in main memory.  The words for reading/writing
 	; these variables are u@ (user-fetch) and u! (user-store).  The usage
 	; for these commands is:
 	;
 	;	<value>  <variable-name>  u!  ( Set variable-name to value)
 	;	         <variable-name>  u@  ( Fetch value of variable name)
 	;
 	; The <value> can be the result of some complex expression, or other.
 	; User variables are numbered 0..31.  If a fetch or store to this
 	; address range occurs, then the machine will generate an error.
 	; This should help debugging.  NOTE: It would be ideal if there were
 	; a mapping between <address> and word name.  That is, just knowing
 	; the address of an exception is tough to fix.
 	;
 	; HERE leaves a pointer to the top of the heap.
 	; DP is the source of the pointer.  DP is a
 	; Forth CPU variable, so it doesn't take up
 	; 
 	;
 	; Memory map:
 	;
 	;	$0000 - $0FFF	: ROM for Forth interpreter
 	;			: NOTE: This "ROM" is implemented in SRAM,
 	;			: but from a user perspective, these are
 	;			: read only memory locations.
 	;	$1000 - $7FFF	: RAM for user code and heap allocation
 	;	$8000 - $FFFF	: I/O space.
 	;	$8000 - $8200	: Video fame buffer, 512 bytes.
 	;
 	DEVICE	SX52, OSCHS3, XTLBUFD, IFBD	; Set everything for the XGS ME
 	FREQ 	80_000_000 	; this is a directive to the ide only
 			  	; if you want to put the XGS ME into RUN mode
 			  	; you must make sure you go into the 
 				; device settings and make sure that
 				; HS3 is enabled, and crystal drive and
 				; feedback are disabled and then re-program
 				; the chip in PGM mode and then switch it to RUN
 
 	IRC_CAL IRC_FAST	; Prevent a warning
 	ID	"xgsforth"	; ID string
 
 	RESET	cold_start
 
 ;
 ; General Banks
 ;
 ; $C0-$FF == Data stack (32 elements)
 ; $A0-$BF == Return stack (16 levels of nesting)
 ; $60-$9F == User variables (32 variables)
 ; $40-$5F == Instruction buffer
 ; $30-$3F == Video variables
 ; $10-$2F == Video line buffer
 ; $00-$0F == Forth interpeter variables
 
 fbank_sz	equ	$10	; Forth CPU globals size
 vbuf_sz		equ	$20	; Video buffer holds data between SRAM and
 				; the TV raster driver
 vbank_sz	equ	$10	; Video global variables
 ibuf_sz		equ	$20	; Instruction buffer, like a cache.
 user_sz		equ	$40	; User variables
 rstack_sz	equ	$20	; Return stack size
 dstack_sz	equ	$40	; Data stack size
 
 		; Forth CPU registers are -really- in bank 0, not
 		; in bank 1.  However, the assembler doesn't have
 		; a mechanism for declaring storage in bank 0.
 		; So, I secretly pretend that it's in bank 1
 		; but always set the bank to 0 when accessing the
 		; Forth CPU registers.
 
 		; By the by, this really means that the temp_sz is
 		; $10 bytes larger than declared.  Just note it and
 		; get on with life.
 _fbank	MACRO
 		clr	FSR
 ENDM
 
 fbank		equ	$10	; Forth CPU global variables
 vbuf		equ	fbank_sz
 vbank		equ	vbuf+vbuf_sz
 ibuf		equ	vbank+vbank_sz
 user		equ	ibuf+ibuf_sz
 rstack		equ	user+user_sz	; Return Stack (build towards lo memory)
 dstack		equ	rstack+rstack_sz ; Data Stack (build towards lo memory)
 
 	IF dstack+dstack_sz != 256
 		ERROR P1 "Are the addresses of the global variables corrrect?"
 	ENDIF
 
 ;
 ; For testing purposes, the Instruction Buffer Mask can be reduced
 ; so that the instruction buffer is refilled more frequently.
 ; Ideally it would be ibuf_sz-1.
 ;
 ibuf_mask	equ	31
 
 ;============================================================
 ;
 ; This file needs 3 include files:
 ; 	general_define.src:  Defines constants (EQU) and SX registers (0..9)
 ;	general_macro.src: Useful general purpose macro definitions and system functions
 ;	general_function.src: System functions that are called instead of macro
 ;	general_registers.src: System variables for the graphics engine.
 ;
 ; I'm using Remz' code here.  Actually, I've tinkered with it.
 ;
 ;============================================================
 
 	include "general_macro.src"
 
 ;============================================================
 ;
 ; User defined variables below this line
 ; The Forth model is going to use 6 global bytes
 ; and 16 Bank bytes with Semi-Direct addressing.
 ; The remainder of the 262 bytes of CPU RAM will
 ; be addressed via Indirect addressing.
 ; 
 ; I named my registers so that I could tell by visual
 ; inspection if I'm in the correct bank or not:
 ; gVars are Global and in the address range $a .. $f.
 ; fVars are in the fcpu_bank.
 ; vVars are in the video_bank.
 ;============================================================
 
 	org	$0a		; Global Variables
 gTOS_lo		ds	1	; The Top Of Stack value
 gTOS_hi		ds	1	
 gT1		ds	1	; Unused values in global space
 gT2		ds	1
 gT3		ds	1
 gT4		ds	1
 
 	org	fbank
 fErr		ds	1	; A signal to WARM_START on how to proceed
 fDSP		ds	1	; Data Stack Pointer; FSR Index value
 fRSP		ds	1	; Return Stack Pointer; FSR Index value
 fPC_lo		ds	1	; Pointer to the instruction currently being interpreted
 fPC_hi		ds	1
 fWRD_lo		ds	1	; Pointer to word currently being interpreted
 fWRD_hi		ds	1
 fibuf_valid	ds	1	; Bit 0 is TRUE if the ibuffer is valid, cleared by branches.
 fStatusReg	ds	1	; A status register with
 fsr_carry_bit	equ	0	; Carry bit
 fsr_zero_bit	equ	1	; A zero bit
 ficount_lo	ds	1
 ficount_hi	ds	1
 
 ;
 ; ERR
 ; 
 ; The error byte is used to indicate what console message to print.
 ; Value 0 indicates no error, all other values indicate some kind
 ; of problem has occured.
 NOERR		EQU	0
 ERRBOOT		EQU	1	; This actually indicates no error, but causes
 				; SRAM to be loaded with the initial kernel code
 				; from iRAM and/or SX20 RAM.
 ERRDOVR		EQU	2	; The Data stack has overflowed
 ERRROVR		EQU	3	; The Return stack has overflowed
 ERRDUND		EQU	4	; The Data stack has underflowed
 ERRRUND		EQU	5	; The Return stack has underflowed
 
 
 
 
 
 ;============================================================
 ;
 ; Instruction Constants
 ;
 ;============================================================
 I_NOPE			EQU	0
 I_LIT8			EQU	1
 I_LIT			EQU	2
 I_ADD			EQU	3
 I_SUB			EQU	4
 I_CMP			equ	5
 I_ROL			equ	6
 I_ROR			equ	7
 
 I_DUP			EQU	8
 I_SWAP			EQU	9
 I_DROP			EQU	10
 I_ROT			equ	11
 I_NROT			equ	12
 
 I_TO_R			equ	13
 I_R_FROM		equ	14
 I_R_COPY		equ	15
 I_DO			equ	16
 I_NEXT			equ	17
 
 I_FETCH			equ	18
 I_CFETCH		equ	19
 I_STORE			equ	20
 I_CSTORE		equ	21
 I_STOREP		equ	22
 
 I_IF			equ	23
 I_IFN			equ	24
 I_BCC			equ	25
 I_BCS			equ	26
 I_BZ			equ	27
 I_BNZ			equ	28
 I_BR			equ	36
 
 I_INCZ			equ	29
 
 I_SXDIR			equ	30
 I_SXOUT			equ	31
 I_SXIN			equ	32
 
 I_AND			equ	33
 I_OR			equ	34
 I_XOR			equ	35
 
 I_ICOUNT		equ	37
 I_SXOUTDE		equ	38
 
 I_UFETCH		equ	39
 I_USTORE		equ	40
 I_USTOREP		equ	41
 I_UINC			equ	42
 I_UDEC			equ	43
 
 I_BRK			EQU	44
 
 ;============================================================
 ; 
 ; CODE
 ;
 ;============================================================
 
 ;
 ; START is basic top-level code path
 ;
 cold_start
 	call	@cold_init
 warm_start
 	call	@warm_init
 
 ;
 ; This label and no operation instruction are here to
 ; latch a break point so that you (the programmer) can
 ; stop execution after initialization.
 ;
 init_done
 	nop
 
 warm_loop
 	call	execute_one
 	jmp	warm_loop
 
 ; execute_one
 ;
 ; Execute one instruction. 
 ;
 ; NOTE: fetch_ibyte guarantees that the FSR will be set for fbank upon return.
 ; Therefore, none of the f_op routines need to set it during execution.
 ;
 execute_one
 	inc	ficount_lo
 	snz
 	inc	ficount_hi
 	call	fetch_ibyte		; w == instruction
 	jmp	pc+w			; Branch given the offset
 
 	jmp	f_nope
 	jmp	f_lit8
 	jmp	f_lit
 	jmp	f_add
 	jmp	f_sub
 	jmp	f_cmp
 	jmp	f_rol
 	jmp	f_ror
 
 	jmp	f_dup
 	jmp	f_swap
 	jmp	f_drop
 	jmp	f_rot
 	jmp	f_nrot
 
 	jmp	f_to_r
 	jmp	f_r_from
 	jmp	f_r_copy
 	jmp	f_do
 	jmp	f_next
 
 	jmp	f_fetch
 	jmp	f_cfetch
 	jmp	f_store
 	jmp	f_cstore
 	jmp	f_storeplus
 
 	jmp	f_if
 	jmp	f_ifn
 	jmp	f_bcc
 	jmp	f_bcs
 	jmp	f_bz
 	jmp	f_bnz
 
 	jmp	f_nope		; i_incz ??
 
 	jmp	f_sxdir
 	jmp	f_sxout
 	jmp	f_sxin
 
 	jmp	f_and
 	jmp	f_or
 	jmp	f_xor
 
 	jmp	br
 
 	jmp	f_icount
 	jmp	f_sxoutde
 
 	jmp	f_ufetch
 	jmp	f_ustore
 	jmp	f_ustorep
 	jmp	f_uinc
 	jmp	f_udec
 	
 
 	jmp	f_brk
 
 ;===================================================================
 ;
 ; fetch_inst
 ;
 ; This routine will fetch an instruction from the CPU's instruction memory.
 ; The Forth CPU program counter is incremented upon entering this routine.
 ; 
 ; User programs are compiled directly into the binary image running
 ; on the CPU.  At run time they are fetched from instruction memory
 ; with iread instructions.  An alternative approach which stores user
 ; programs in external SRAM is documented in unused-forth.src.
 ;
 ; I've placed fetch_ibyte in page 0 so that some of the callers
 ; can get 1 cycle access to it without having to first set the
 ; PAGE bits.
 ;
 ;===================================================================
 fetch_ibyte
 	inc	fPC_lo		; Increment to next instruction byte
 	snz			; Keep track of the hi byte
 	inc	fPC_hi
 
 	mov	w,#user_code>>8	; Build the address to the code
 	add	w,fPC_hi
 	mov	m, w		; Hi 4 bits of address in m
 	mov	w,fPC_lo	; Lo 8 bits of user_code address must be zero.
 	iread			; Fetch an instruction byte.
 
 	retp
 
 ;
 ; Here's a handy break point.  You can add this instruction to
 ; your programs and then set a break point here.
 ; 
 f_brk
 	retp
 
 f_nope
 f_cmp
 	retp
 
 f_icount
 	call	@dpush_tos
 	mov	w,ficount_lo
 	mov	gTOS_lo,w
 	mov	w,ficount_hi
 	mov	gTOS_hi,w
 	retp
 
 f_sxoutde
 	mov	w,#$1f
 	mov	m,w
 	mov	w,#$00
 	mov	!rd,w
 	mov	!re,w
 	mov	w,gTOS_lo
 	mov	rd,w
 	mov	w,gTOS_hi
 	mov	re,w
 	jmp	@dpop_tos
 	
 
 ; negate
 ;
 ; Two's complement of the top of stack
 ; Perform a NOT and then increment the value;
 ; skip the hi if the Status.Zbit is not set,
 ; that is if we didn't roll over the lo byte.
 f_negate
 	not	gTOS_lo
 	not	gTOS_hi
 	inc	gTOS_lo
 	snz
 	inc	gTOS_hi
 	retp
 
 ; dup
 ;
 ; DUP is implemented by 'pushing' the current TOS.
 f_dup
 	jmp	@dpush_tos
 
 ; swap
 ;
 ; Swap the current TOS and the value actually at the DSP.
 ; This algorithm uses the byte below the current top element
 ; on the stack as temporary storage.
 f_swap
 	mov	W,fDSP
 	mov	FSR,W
 	mov	W,INDF		; Pick up the lo on the stack
 	dec	FSR
 	mov	INDF,W		; Store in temp spot
 	inc	FSR
 	mov	W,gTOS_lo
 	mov	INDF,W		; Store TOS_lo onto stack
 	inc	FSR
 	mov	W,INDF		; Pick up hi off stack
 	mov	gTOS_lo,W	; Unused spot to save hi
 	mov	W,gTOS_hi
 	mov	INDF,W		; Store in hi on the stack
 	mov	W,gTOS_lo	; This is the hi from the stack
 	mov	gTOS_hi,W	; Hi is completely swapped
 	dec	FSR
 	dec	FSR
 	mov	W,INDF		; Pick up the lo from the stack [temp]
 	mov	gTOS_lo,W	; Lo is completely swapped
 	_fbank			; Ensure FSR doesn't have stray lo bits
 	retp			; SWAP complete
 
 ;
 ; f_lit8: The next byte is the lo order 8 bits of the new Top of stack.
 ; The issue is, do we want this to be a signed 8 bit value?  if so, this
 ; code will have to change.
 ;
 f_lit8
 	call	@dpush_tos		; push the current TOS
 	call	fetch_ibyte		; w == literal 8 data
 	mov	gTOS_lo,w		; Set the lo order byte
 	clr	gTOS_hi			; Nuke the hi order byte
 	retp
 
 f_lit
 	call	@dpush_tos		; push the current TOS
 	call	fetch_ibyte		; w == literl 16 bit data, lo
 	mov	gTOS_lo,w
 	call	fetch_ibyte
 	mov	gTOS_hi,w
 	retp
 
 f_sub
 	call	f_negate
 	jmp	f_add
 
 user_addr
 
 f_ufetch
 	rl	gTOS_lo
 	mov	w,gTOS_lo
 	and	w,#(user_sz-1)*2
 	mov	FSR,w
 	mov	w,INDF
 	mov	gTOS_lo,w
 	inc	FSR
 	mov	w,INDF
 	mov	gTOS_hi,w
 	_fbank
 	retp
 
 f_ustore
 	rl	gTOS_lo
 	mov	w,gTOS_lo
 	and	w,#(user_sz-1)*2
 	mov	gT1,w	
 	call	@dpop_tos
 	mov	w,gT1
 	mov	FSR,w
 	mov	w,gTOS_lo
 	mov	INDF,w
 	mov	w,gTOS_hi
 	mov	INDF,w
 	_fbank
 	retp
 
 f_ustorep
 	rl	gTOS_lo
 	mov	w,gTOS_lo
 	and	w,#(user_sz-1)*2
 	mov	gT1,w	
 	call	@dpop_tos
 	mov	w,gT1
 	mov	FSR,w
 	mov	w,gTOS_lo
 	add	INDF,w
 	mov	w,gTOS_hi
 	add	INDF,w
 	_fbank
 	retp
 
 f_uinc
 	rl	gTOS_lo
 	mov	w,gTOS_lo
 	and	w,#(user_sz-1)*2
 	mov	FSR,w
 	mov	w,#1
 	add	INDF,w			; I use 'add' instead of 'inc' because
 	inc	FSR			; I need the carry flag to indicate roll-over,
 	snc				; and 'inc' only sets Z, not C.
 	inc	INDF			; Now I can use inc because I don't need any flags.
 	jmp	@dpop_tos
 
 f_udec	
 	rl	gTOS_lo
 	mov	w,gTOS_lo
 	and	w,#(user_sz-1)*2
 	mov	FSR,w
 	clr	gTOS_hi			; TOS is now a temporary variable
 	mov	w,INDF			; Test the lo of the user variable
 	snz				; If zero, then note that we need
 	not	gTOS_hi			; to dec the hi byte of the user var.
 	dec	INDF
 	inc	FSR			; I need the carry flag to indicate roll-over,
 	mov	w,gTOS_hi
 	add	INDF,w			; Factor in a decrement of the hi (if w == $FF)
 	jmp	@dpop_tos
 
 f_comp
 	call	f_negate
 	
 ;
 ; f_rot
 ; f_nrot
 ;
 ; These two funky stack manipulators will rotate the top 3
 ; stack items clock wise or counter clock wise.
 ; rot  -- leave the original 3rd item on top of the stack
 ; -rot -- rotate, twice
 ; rot:
 ; Entry:           ;      n3  n2 [n1]
 ; Step 1: >r       ;          n3 [n2]
 ; Step 2: swap     ;          n2 [n3]
 ; Step 3: r>       ;      n2  n3 [n1]
 ; Step 4: swap     ;      n2  n1 [n3]
 ; Exit:            ;      n2  n1 [n3]
 f_rot
 	call	f_to_r
 	call	f_swap
 	call	f_r_from
 	jmp	f_swap
 ;
 ; -rot:
 ; Entry:           ;      n3  n2 [n1]
 ;        swap      ;      n3  n1 [n2]
 ;        >r        ;          n3 [n1]
 ;        swap      ;          n1 [n3]
 ;        r>        ;      n1  n3 [n2]
 ; Exit:            ;      n1  n3 [n2]
 f_nrot
 	call	f_swap
 	call	f_to_r
 	call	f_swap
 	jmp	f_r_from
 
 ; f_to_r
 ; f_r_from
 ;
 ; These routines move the top of stack to and from the return
 ; stack.
 f_to_r
 	call	@rpush_tos
 	jmp	@dpop_tos
 
 f_r_from
 	call	@dpush_tos
 	jmp	@rpop_tos
 
 f_r_copy
 	call	@dpush_tos
 	jmp	@rget_tos
 
 f_do
 f_next
 	retp
 
 ; fetch ops
 ;
 ; During simulation I'll use 64 bytes of processor RAM as memory
 ; for fetch store operations.  These are the 60 bytes starting
 ; at temp and ending in the last byte of vbank.
 f_fetch
 	call	fetch_lo
 	inc	FSR
 	mov	w,INDF
 	mov	gTOS_hi,w
 	_fbank
 	retp
 
 f_cfetch
 	call	fetch_lo
 	clr	gTOS_hi
 	_fbank
 	retp
 
 fetch_lo
 ;	mov	w,#temp
 	add	w,gTOS_lo
 	mov	FSR,w
 	mov	w,INDF
 	mov	gTOS_lo,w
 	retp
 
 ;
 ; f_store: gT1/gT2 contains the destination address.
 ;
 f_store
 ;	call	store_lo
 	inc	FSR
 	mov	w,gTOS_hi
 	mov	INDF,w
 	_fbank
 	retp
 
 f_cstore
 ;	call	store_lo
 	_fbank
 	retp
 
 f_storeplus
 	call	@rpush_tos
 	call	@f_fetch
 	call	@f_add
 	call	@dpush_tos
 	call	@rpop_tos
 	jmp	@f_store
 
 store_lo
 ;	mov	w,#temp
 	add	w,gTOS_lo
 	mov	gT1,w
 	call	@dpop_tos
 	mov	w,gT1
 	mov	FSR,w
 	mov	w,gTOS_lo
 	mov	INDF,w
 	retp
 
 f_drop
 	jmp	@dpop_tos
 
 f_and
 	mov	W,fDSP		; Set RAM pointer to Data Stack Pointer
 	mov	FSR,W
 	mov	W,INDF		; Perform the Lo byte and
 	and	gTOS_lo,W	; Save Lo byte
 	inc	FSR		; Set RAM ponter to Hi
 	mov	W,INDF		; Perform the Hi byte and
 	and	gTOS_hi,W	; Save Hi byte
 	_fbank
 	inc	fDSP		; Drop the stack based operand
 	inc	fDSP
 	retp
 
 f_or
 	mov	W,fDSP		; Set RAM pointer to Data Stack Pointer
 	mov	FSR,W
 	mov	W,INDF		; Perform the Lo byte or
 	or	gTOS_lo,W	; Save Lo byte
 	inc	FSR		; Set RAM ponter to Hi
 	mov	W,INDF		; Perform the Hi byte or
 	or	gTOS_hi,W	; Save Hi byte
 	_fbank
 	inc	fDSP		; Drop the stack based operand
 	inc	fDSP
 	retp
 
 f_xor
 	mov	W,fDSP		; Set RAM pointer to Data Stack Pointer
 	mov	FSR,W
 	mov	W,INDF		; Perform the Lo byte xor
 	xor	gTOS_lo,W	; Save Lo byte
 	inc	FSR		; Set RAM ponter to Hi
 	mov	W,INDF		; Perform the Hi byte xor
 	xor	gTOS_hi,W	; Save Hi byte
 	_fbank
 	inc	fDSP		; Drop the stack based operand
 	inc	fDSP
 	retp
 
 ; complement
 ;
 ; One's complement of the top of stack
 f_complement
 	not	gTOS_lo
 	not	gTOS_hi
 	retp
 
 
 ;==================================================================
 ;
 ; SX GPIO pin access
 ;   sxdir will set the direction of port A
 ;   sxout will set the pins of port A
 ;   sxin  will read the pins of port A
 ;
 f_sxdir
 	mov	w,#$1f			; Set the Mode register so that
 	mov	m,w			; data will move from W into
 	mov	w,gTOS_lo		; the target register's direction
 	mov	!ra,w			; bits.
 	jmp	@dpop_tos
 
 f_sxout
 	mov	w,gTOS_lo
 	mov	ra,w
 	jmp	@dpop_tos
 
 f_sxin
 	call	@dpush_tos
 	mov	w,ra
 	mov	gTOS_lo,w
 	clr	gTOS_hi
 	retp
 
 f_rol
 	clc
 	rl	gTOS_lo
 	rl	gTOS_hi
 	jmp	f_cc_to_fsr
 
 f_ror
 	clc
 	rr	gTOS_hi
 	rr	gTOS_lo
 f_cc_to_fsr
 	clrb	fStatusReg.fsr_carry_bit
 	snc
 	setb	fStatusReg.fsr_carry_bit
 	retp
 
 f_if
 	mov	w,gTOS_lo
 	or	w,gTOS_hi
 	sz
 	jmp	nbr
 	jmp	br
 
 f_ifn
 	mov	w,gTOS_lo
 	or	w,gTOS_hi
 	snz
 	jmp	nbr
 	jmp	br
 
 f_bcc
 	snb	fStatusReg.fsr_carry_bit
 	jmp	nbr
 	jmp	br
 
 f_bcs
 	sb	fStatusReg.fsr_carry_bit
 	jmp	nbr
 	jmp	br
 
 f_bz
 	snb	fStatusReg.fsr_zero_bit
 	jmp	nbr
 	jmp	br
 
 f_bnz
 	sb	fStatusReg.fsr_zero_bit
 	jmp	nbr
 	jmp	br
 
 
 ;==================================================================
 ;
 ; br & nbr
 ; Branch and No Branch
 ; These routines are the work horses of bcc, bcs,
 ; bz, bnz, if, and ifn.
 ; These routines will either add the next byte to
 ; the program counter, or not.
 
 br
 	call	fetch_ibyte
 
 ; We're adding a signed 8 bit branch offset to a 16 bit
 ; unsigned program counter.  To do this, we need to sign
 ; extend the low order 8 bits.  The algorithm employed
 ; is to use the rotate left instruction to test the
 ; sign bit and then tuck the bit back.  And, we first
 ; zero a register which will be the high order bit and
 ; then invert the bits (using NOT) if the sign bit
 ; (expressed in the carry bit of the status register)
 ; is set.
 
 	clr	gT3		; Will be the sign extension
 	mov	gT2,w		; Save W to a register for the rotate op.
 
 ;
 ; At this point we need to save the current PC_lo
 ;
 	mov	w,fPC_lo
 	mov	gT1,w
 	mov	w,gT2
 ;
 ; We now resume our regularly scheduled sign extension algorithm
 ;
 
 	rl	gT2		; Yank out the sign bit
 	snc			; Skip if positive offset
 	not	gT3		; Sign extension is negative, $FF
 
 ; Now we perform a run-of-the-mill 16 bit arithmetic
 ; addition between the two {lo, hi} pairs {w, gT1}
 ; and {fPC_lo, fPC_hi}.
 
 	add	fPC_lo,w
 	snc
 	inc	fPC_hi
 	mov	w,gT3
 	add	fPC_hi,w
 	jmp	check_br_ibuf
 
 ;
 ; nbr is the piece of code that handles Not Branching
 ; for various control logic.  It skips over the offset
 ; byte following the instruction.
 ;
 nbr
 	mov	w,fPC_lo
 	mov	gT1,w
 	inc	fPC_lo		; Dump the following byte
 	snz
 	inc	fPC_hi
 	jmp	check_br_ibuf	; Fall through to check_br_ibuf?
 
 ;
 ; check_br_ibuf
 ;
 ; This routine will compare an old PC_lo against the current
 ; PC_lo and clear the validity of the instruction buffer
 ; accordingly.  The basic premise is that the bits outside of
 ; ibuf_mask must be the same in order for the instruction buffer
 ; to remain valid (through this routine does -not- set the
 ; valid bit).  If those bits are not the same, then the valid
 ; bit is cleared.
 ;
 ; INPUT: gT1 is the old PC_lo before the branch.
 ;
 check_br_ibuf
 	mov	w,#~ibuf_mask
 	and	gT1,w
 	and	w,fPC_lo
 	sub	gT1,w
 	sz
 	clrb	fibuf_valid.0
 	retp
 
 ;==================================================================
 ;==================================================================
 ;
 ; Page Two
 ;
 ;==================================================================
 ;==================================================================
 
 	ORG	$200
 
 ; f_add
 ;
 ; The current Top Of Stack value is added to the Next Of Stack,
 ; or the value currently in the stack array.
 ; 	
 f_add
 	mov	W,fDSP		; Set RAM pointer to Data Stack Pointer
 	mov	FSR,W
 	mov	w,indf		; Fetch the Lo byte
 	add	gTOS_lo,w	; Perform the Lo byte add
 	snc			; If carry,
 	inc	gTOS_hi		;    then increment the Hi
 	inc	FSR		; Set RAM ponter to Hi
 	mov	W,INDF
 	add	gTOS_hi,w	; Perform the Hi byte add
 	_fbank
 	inc	fDSP		; Drop the stack based operand
 	inc	fDSP
 	retp
 
 ;===================================================================
 ;
 ; Stack operation routines
 ;
 ; dpush_tos
 ;
 ; The current Top Of Stack value is stored in Global Registers
 ; and needs to be securely saved onto the Data Stack.  An example
 ; of why this is necessary is the LIT command.  That command will
 ; replace the TOS with a new value.  To do that, it must push the
 ; current TOS and then it can load the TOS registers with a new
 ; value.
 dpush_tos
 	_fbank
 	dec	fDSP		; pre-decr stack pointer
 	mov	W,fDSP		; W == DSP
 	mov	FSR,W		; FSR == W
 	mov	W,gTOS_hi	; W == gTOS_hi, etc.
 	mov	INDF,W		; Store Hi
 	dec	FSR		; Set up pointer for Lo
 	mov	W,gTOS_lo	;
 	mov	INDF,W		; Store Lo
 	_fbank
 	dec	fDSP		; Fix up Data Stack Pointer
 	retp
 
 ;===================================================================
 dpop_tos
 	_fbank
 	mov	w,fDSP
 	mov	FSR,w
 	mov	w,INDF
 	mov	gTOS_lo,w
 	inc	FSR
 	mov	w,INDF
 	mov	gTOS_hi,w
 	_fbank
 	inc	fDSP
 	inc	fDSP
 	retp
 
 ;===================================================================
 ; rpush_tos
 ;
 ; The current Top Of Stack value is stored in Global Registers
 ; and needs to be securely saved onto the Return Stack.
 rpush_tos
 	_fbank
 	dec	fRSP		; pre-decr stack pointer
 	mov	W,fRSP		; W == DSP
 	mov	FSR,W		; FSR == W
 	mov	W,gTOS_hi	; W == gTOS_hi, etc.
 	mov	INDF,W		; Store Hi
 	dec	FSR		; Set up pointer for Lo
 	mov	W,gTOS_lo	;
 	mov	INDF,W		; Store Lo
 	_fbank
 	dec	fRSP		; Fix up Return Stack Pointer
 	retp
 
 ;===================================================================
 rpop_tos
 	_fbank
 	mov	w,fRSP
 	inc	fRSP
 	inc	fRSP
 	jmp	rget_tos2
 
 ;===================================================================
 rget_tos
 	_fbank
 	mov	w,fRSP
 rget_tos2
 	mov	FSR,w
 	mov	w,INDF
 	mov	gTOS_lo,w
 	inc	FSR
 	mov	w,INDF
 	mov	gTOS_hi,w
 	_fbank
 	retp
 
 ;===================================================================
 ; CALL_push
 ;
 ; Push the current Program Counter onto the return stack.
 call_push
 	_fbank
 	dec	fRSP
 	mov	W,fRSP
 	mov	FSR,W
 	mov	W,fPC_hi
 	mov	INDF,W
 	dec	FSR
 	mov	W,fPC_lo
 	mov	INDF,W
 	_fbank
 	dec	fRSP
 	retp
 
 
 ;===================================================================
 ; init_stacks
 ;
 ; This routine will initialize the data and return stacks
 ; The data stack occupies $C0 - $FF.  It's a pre-decrement/
 ; post-increment stack.
 ; The return stack occupies $A0 - $BF; same on the pre-decr/
 ; post-incr.
 ; NOTE: The TOS held in global registers is -not- accounted for
 ; by the current stack pointer.  Think of it as the items on
 ; the stack.  And, there's a TOS that's implied in stack ops.
 init_stacks
 	_fbank
 	STIMM	fDSP,dstack+dstack_sz
 	STIMM	fRSP,rstack+rstack_sz
 
 	; Clear the return stack & data stack
 	mov	gTOS_lo,#rstack_sz+dstack_sz
 	mov	FSR,#rstack
 	jmp	zero_ram
 
 ;===================================================================
 ; init_ibuf
 ;
 ; Zero out the instruction buffer.  This is mostly useful for
 ; single stepping.  with real run time nobody is looking in this
 ; buffer and so won't notice if trash is present before loading
 ; with real instructions from SRAM.
 init_ibuf
 	mov	gTOS_lo,#ibuf_sz
 	mov	FSR,#ibuf
 	jmp	zero_ram
 
 ;===================================================================
 ; init_vbuf
 ;
 ; Zero out the video buffer.  See usefulness note in init_buf.
 init_vbuf
 	mov	gTOS_lo,#vbuf_sz
 	mov	FSR,#vbuf
 	call	zero_ram
 
 	mov	gTOS_lo,#vbank
 	mov	FSR,#vbank_sz
 	jmp	zero_ram
 
 ;===================================================================
 ; init_fbank
 ;
 ; Zero out all of the CPU's core registers, except the error no.
 init_fbank
 	_fbank
 	clr	fDSP
 	clr	fRSP
 	clr	fPC_lo
 	clr	fPC_hi
 	clr	fWRD_lo
 	clr	fWRD_hi
 	clr	fibuf_valid
 	clr	fStatusReg
 	clr	$19
 	clr	$1a
 	clr	$1b
 	clr	$1c
 	clr	$1d
 	clr	$1e
 	clr	$1f
 	retp
 
 
 ;===================================================================
 ; zero_ram
 ; 
 ; Here's a handy routine that will use the TOS_lo as a counter
 ; and zero from the already set FSR register RAM to zero.
 ;
 ; INPUT:
 ;	TOS_lo	: Count of bytes to zero
 ;	FSR	: Start address of RAM bytes to zero
 ;
 ; RETURNS:
 ;	TOS_lo	: zero
 ;	FSR	: FSR.start + TOS_lo.start
 zero_ram
 	retp			; Fast path is EXIT NOW!
 
 	clr	INDF
 	inc	FSR
 	decsz	gTOS_lo
 	jmp	zero_ram
 	_fbank		; Ensure FSR doesn't have stray lo bits
 	retp
 
 
 ;===================================================================
 ;
 ; cold_init is the processing to initialize the processor
 ;
 cold_init
 	_fbank			; Note that this is a cold start
 	MOV	fERR,#ERRBOOT
 	retp
 
 ;===================================================================
 warm_init
 	call	@init_stacks
 	call	@init_ibuf
 	call	@init_vbuf
 	call	@init_fbank
 	_fbank
 	mov	w,#$ff			; It's a pre-increment architecture.
 	mov	fPC_lo,w		; So, we set the PC to -1 and the first
 	mov	fPC_hi,w		; instruction will come from 0.
 	clr	gTOS_lo
 	clr	gTOS_hi
 	retp
 
 ;============================================================
 ;
 ; Include our happy, friendly, helpful general functions.
 ;
 ;============================================================
 
 SRAM_START	EQU	$400
 	include	"sram.src"
 
 ;============================================================
 ; Program file
 	org	$800
 user_code
 	include "user.obj"