Originally published inForth Dimensions XVIII/2, 30

hForth - A Small, Portable ANS Forth

Wonyong Koh, Ph.D.
Taejon, Korea
wykoh@genitech.co.kr

Background history

I started a personal project two and half years ago, which was in my mind for quite a long time: Widespread Forth in Korea. Postfix is natural to Korean people since a verb comes after an object in Korean language. Also Forth does not restrict a programmer to use only alphanumeric characters. A Korean Forth programmer can easily express his idea in comfortable Korean words rather than to be forced to think in English. As one might expect, there was an effort for Korean Forth. Dr. Chong-Hong Pyun and Mr. Jin-Mook Park built a Korean version of fig-Forth for Apple II computer in mid-eighties. Long-time FD readers may remember Dr. Pyun's letter in Forth Dimensions X/6, 8. Unfortunately, Korean computer community swiftly moved to IBM PC while Dr. Pyun wrote articles about their work in popular programming and science magazines. It became somewhat obsolete before being known widely. Despite of this and other efforts Forth has been virtually unknown to most Koreans. Two and half years ago I decided to restart it and looked for a vehicle for the purpose. I found that there was no small ANS Forth system for IBM PC. I decided to build one. In the course of ANSifying eForth I have replaced every line of eForth source and felt that it deserved its own name. I knew that there were Forth systems named as bForth, cForth, eForth, gForth, iForth, Jforth and KForth. I picked h since it seemed not yet used by anyone and also Han means Korean in Korean language.

ROM model came first

eForth, which was written by Mr. Bill Muench and Dr. C. H. Ting in 1990, seemed to be a good place to start. I studied eForth source and Dr. Ting's article in Forth Dimensions XIII/1, 15 and set the following goals:

small machine dependent kernel and portable high level code
strict compliance to ANS Forth
extensive error handling through CATCH/THROW mechanism
separated code and name space
use of wordlists
explicit consideration for separated RAM/ROM address space
simple vectored input/output
direct threaded code
easy upgrade path to optimize for specific CPU

Most of them are adapted from eForth. I emphasize extensive error handling since some of well-known Forth systems cannot manage as simple a situation as divide-by-zero. In hForth almost all ambiguous conditions specified in the ANS Forth document issue THROW and are captured by CATCH either by user-defined word or by hForth system.

hForth ROM model is especially designed for a minimal development system for embedded applications which uses non-volatile RAM or ROM emulator in place of ROM. The content of ROM address space can be changed during development phase and is copied later to real ROM for production system. hForth ROM model checks whether or not ROM address space is alterable when it starts. New definitions go into ROM address space if it is alterable. Otherwise they go into RAM address space.

  Alterable ROM address space       Unalterable ROM address space
===============================    ===============================
                                    name space of new definitions
                                   -------------------------------

       RAM address space                  RAM address space

-------------------------------    -------------------------------
                                       data space / code space 
          data space                     of new definitions
===============================    ===============================
 name space of old definitions      name space of old definitions
-------------------------------    -------------------------------
 name space of new definitions
-------------------------------

       ROM address space                  ROM address space

-------------------------------    -------------------------------
   data space / code space 
     of new definitions                      data space
-------------------------------    -------------------------------
 code space of old definitions      code space of old definitions
===============================    ===============================

Data space can be allocated either in ROM address space for tables of constants or in RAM address space for arrays of variables. ROM and RAM, recommended in the Appendix of the Standard document, are used to switch data space between RAM and ROM address space. Name space may be excluded in final system if an application does not require Forth text interpreter. 8086 hForth ROM model occupies little more than 6 KB of code space for all Core word set words and requires at least 1 KB of RAM address space for stacks and system variables.

The assembly source is arranged so that more implementation-dependent words come earlier. System-dependent words come first, CPU-dependent words come after, then come all the other high level words. Colon definitions of all high level words are given as comments in the assembly source. One needs to redefine only system-dependent words to port hForth ROM model to a 8086 single board computer from current one for MS-DOS machine without changing any CPU-dependent words. Standard words come after essential non-Standard words in each system-dependent, CPU-dependent, and portable part. All Standard Core word set words are included to make hForth an ANS Forth system. High level Standard words in the last part of the assembly source are not used for the implementation of hForth and can be omitted to make a minimal system. Current 8086 hForth ROM model for MS-DOS has 59 kernel words: 13 system-dependent words, 21 CPU-dependent non-Standard words and 25 CPU-dependent Standard words. System-dependent words include input/output words and other words for file input through keyboard redirection of MS-DOS. For five of kernel words, including (search-wordlist) and ALIGNED, CPU-dependent definitions are used instead of high level definitions for faster execution.

System initialization and input/output operations are performed through following execution vectors: 'boot, 'init-i/o, 'ekey?, 'ekey, 'emit?, 'emit, and 'prompt. Appropriate actions can be taken by redirecting these execution vectors. 'init-i/o is executed in THROW and when the system starts while 'boot is executed only once when the system starts. One has better chance not to loose control by restoring i/o vectors through 'init-i/o whenever an exception condition occurs. For example, serial communication link may not be broken by an accidental change of communication parameters. 'boot may be redirected to an appropriate application word instead of default word in a finished application. Traditional 'ok<end-of-line>' prompt (which is actually not) may be replaced by redirecting 'prompt.

Control structure matching is rigorously checked for different control flow stack items. Control-flow stack is implemented on data stack. Control-flow stack item is represented by two data stack items as below

Control-flow stack item     Representation (parameter and type)
-----------------------    -------------------------------------
     dest                    control-flow destination      0
     orig                    control-flow origin           1
     of-sys                  OF origin                     2
     case-sys                x (any value)                 3
     do-sys                  ?DO origin           DO destination
     colon-sys               xt of current definition     -1

hForth can detect the nonsense clause "BEGIN IF AGAIN THEN" easily. CS-ROLL and CS-PICK can be applied to the list of dests and origs only. This can be verified by checking whether the ORed type is 1. I can not think of a control-structure-mismatch that current hForth cannot catch.

Number of words grows substantially as a Forth system is extended. Dictionary search can be time-consuming unless hashing or other means are employed. Currently hForth uses no special search mechanism, however, maintains reasonable compilation speed by keeping shallow search depth in addition to using optimized (search-wordlist). Initially two wordlists are in the search order stack: FORTH-WORDLIST and NONSTANDARD-WORDLIST. FORTH-WORDLIST contains all the Standard words and NONSTANDARD-WORDLIST contains all the other words. Upon extending hForth, optional Standard words will go in FORTH-WORDLIST and lower-level non-Standard words to implement them will be kept in separate wordlists which are usually not in the search order stack. Only a small number of non-Standard words to be used by a user will be added in NONSTANDARD-WORDLIST.

RAM and EXE models follow

hForth package consists of three models: ROM, RAM and EXE model. hForth RAM model is for RAM only system where name, code and data spaces are all combined. hForth EXE model is for a system in which code space is completely separated from data space and execution token (xt) may not be a valid address in data space. 8086 hForth EXE model uses two 64 KB full memory segments: one for code space and the other for name and data spaces. EXE model might be extended for an embedded system where name space resides in host computer and code and data space are in target computer. Few kernel words are added to ROM model to derive RAM and EXE models and only several high level words such as HERE and CREATE are redefined.

ROM and RAM models are probably too slow for many practical applications as original eForth. However, 8086 hForth EXE model is more competitive. High-level colon definitions of all frequently used words are replaced with 8086 assembly code definitions in hForth EXE model. Comparison with other 8086 Forth systems can be found in Mr. Borasky's article "Forth in the HP100LX" Forth Dimensions XVII/4, 6.

hForth models are highly extensible. Optional word set words as well as an assembler can be added on top of basic hForth system. Complete Tools, Search Order, Search Order Ext word set words and other optional Standard words are defined in OPTIONAL.F included in 8086 hForth package. 8086 Forth assembler is provided in ASM8086.F. Many of Core Ext word set words are provided in OPTIONAL.F and all the other Core Ext words except obsolescent ones and [COMPILE] (for which POSTPONE should be used) are provided in COREEXT.F. Complete Double and Double Ext word set words are provided in DOUBLE.F. High level definitions in these files should work in hForth for other CPUs. These files are loaded into 8086 hForth for MS-DOS machines through keyboard redirection function of MS-DOS. Complete Block, Block Ext, File and File Ext word set words are provided in MSDOS.F using MS-DOS file handle functions. Other utilities are also included in 8086 hForth package. LOG.F is to capture screen output to an MS-DOS text file, which is edited to make Forth text source. DOSEXEC.F is to call MS-DOS executables within hForth system. A user can call familiar text editor, edit Forth text source, exit the editor, load the source and debug without leaving hForth environment. This process can be repeated without saturating address spaces if a MARKER word is defined in the beginning of the Forth text source and called before reload the source.

Multitasker

I had a chance to look at Mr. Muench's eForth 2.4.2. The multitasker is the most elegant one among those that I have seen. It does task switching through only two high-level words. I immediately adapted it to hForth. Mr. Muench's multitasker is now included in P21Forth for MuP21 processor.

In Forth multitasker each task has its own context: data stack, return stack and its own variables (traditionally called user variables). The contexts must be stored and restored properly when tasks are suspended and resumed. In Mr. Muench's multitasker PAUSE saves current task's context and wake restores next task's context. PAUSE saves return stack pointer on data stack and data stack pointer into a user variable stackTop, then jumps to next task's status which is held in current task's user variable follower. It is defined as:

    : PAUSE   rp@ sp@ stackTop !  follower @ >R ; COMPILE-ONLY

Advanced Forth users already know that '>R EXIT' causes high level jump for traditional Forth virtual machine. Each task's user variable status holds wake and immediately followed by user variable follower. Initially hForth has only one task SystemTask. Its user variable status and follower hold:

SystemTask's   status                follower
              +------+ +-----------------------------------------+
              | wake | | absolute address of SystemTask's status |
              +------+ +-----------------------------------------+

If FooTask is added, status and follwer of the two tasks now hold:

SystemTask's   status                follower
              +------+ +-----------------------------------------+
              | wake | | absolute address of FooTask's status    |
              +------+ +-----------------------------------------+

   FooTask's   status                follower
              +------+ +-----------------------------------------+
              | wake | | absolute address of SystemTask's status |
              +------+ +-----------------------------------------+

Effectively current task's PAUSE jumps to next task's wake. At this point user variables and stacks are not switched yet. wake assigns the return stack item (the next address of status, i.e. the address of follower) into global variable userP, which is used to calculate absolute address of user variables. All user variables cluster in front of follower. Now user variables are switched. Then wake restores data stack pointer stored in user variable stackTop (now data stack is switched) and restores return stack pointer saved on top of data stack (now return stack is switched). wake is defined as:

    : wake   R> userP !  stackTop @ sp!  rp! ; COMPILE-ONLY

What is clever here is that one item on return stack, left by PAUSE and consumed by wake, is used to transfer control as well as information for context switching. This multitasker is highly portable. Not a line of multitasker code was touched when hForth 8086 RAM model was moved to Z80 processor. This is also verified by Neal Crook when porting hForth to ARM processor. I believe that it should be possible to port this multitasker to subroutine-threaded or native-code Forth by redefining them in machine codes.

I used this multitasker to update graphics screen and make cursor blink in HIOMULTI.F. Console output is redirected to graphics screen to display Korean and English characters for VGA and Hercules Graphics Adapters. EMIT fills characters into a buffer and a background task displays them on graphics screen when hForth is waiting for keyboard input. Scrolling text on graphics screen is as fast as on text screen. I also used the multitasker for serial communication in SIO.F. Main routine fetches characters from input buffer and stores characters in output buffer while background task does actual hardware control.

Jump table interpreter

I applied all the best ideas and tricks I know to hForth. Most of them came from other people while I added a few of my own. I believe that some of them are worth to mention.

hForth text interpreter uses vector table to determine what to do with a parsed strings after search it in the Forth dictionary. Dictionary search results the string and 0 (for an unknown word); xt and -1 (for non-immediate word); or xt and 1 (for immediate word) on data stack. hForth text interpreter chooses next action by the following code:

    1+ 2* STATE @ 1+ + CELLS 'doWord + @ EXECUTE

'doWord table consists of six vectors.

                               compilation state   interpretation state
                               (STATE returns -1)   (STATE returns 0)
                               ------------------  --------------------
non-immediate word (TOS = -1)     optiCOMPILE,         EXECUTE
unknown word       (TOS =  0)      doubleAlso,        doubleAlso
immediate word     (TOS =  1)       EXECUTE            EXECUTE

TOS = top-of-stack

The behavior of the hForth text interpreter can be interactively changed by replacing these vectors. For example, one can make hForth interpreter accept only single-cell numbers by replacing doubleAlso, and doubleAlso with singleOnly, and singleOnly respectively. optiCOMPILE, does the same thing as Standard word COMPILE, except that it removes one level of EXIT if possible. optiCOMPILE, does not compile null definition CHARS into the current definition. Also it compiles 2* instead of CELLS if CELLS is defined as ": CELLS 2* ;".

Special compilation action for default compilation semantics

Compiling words created by CONSTANT, VARIABLE, and CREATE as literal values can increase execution speed, especially for native-code Forth compilers. A solution is implemented in hForth EXE model to provide special compilation action for default compilation semantics. Words created by CONSTANT, VARIABLE, and CREATE have a special mark and xt for special compilation action. hForth compiler executes the xt if it sees the mark. (POSTPONE must find this special compilation action also and compile it.) A new data structure with special compilation action can be built by CREATE and only two non-Standard words: implementation-dependent doCompiles> and implementation-independent compiles>. doCompiles> verifies whether the last definition is ready for special compilation action and takes an xt on data stack and assign it as special compilation action of the last definition. compiles> is defined as:

    : compiles>  ( xt -- )
        POSTPONE LITERAL POSTPONE doCompiles> ; IMMEDIATE

For example, 2CONSTANT can be defined as:

    :NONAME   EXECUTE POSTPONE 2LITERAL ;
    : 2CONSTANT
        CREATE SWAP , , compiles> DOES> DUP @ SWAP CELL+ @ ;

It is the user's responsibility to match special compilation action with the default compilation semantics. I believe that this solution is general enough to be applied to other Forth systems.

Turtle Graphics

I implemented LOGO's Turtle Graphics in hForth. The turtle moves on VGA or Hercules graphics screen and follows postfix Forth command '

100
FORWARD

' instead of prefix LOGO command '

FORWARD
100

'. No floating-point math is used at all. Integers are used represent angles in degree rather than in radian and look-up table is used to evaluate trigonometric functions. Only a few words are defined in machine code for line drawing and trigonometric function evaluation. The turtle moves swiftly on a 286 machine. The Forth source and MS-DOS executables, TURTLE.F, ETURTLE.EXE (using English commands) and HTURTLE.EXE (using Korean commands), are included.

Summary

hForth is a small ANS Forth system based on eForth. It is especially designed for small embedded system. The basic ROM and RAM models are designed for portability, however, can be easily optimized for a specific CPU to build a competitive system as shown in 8086 EXE model. hForth packages for 8086 and Z80 can be found at http://www.taygeta.com/forthcomp.html or ftp://ftp.taygeta.com/pub/Forth/Reviewed/. hForth is also ported to H8 processor by Mr. Bernie Mentink and to ARM processor by Neal Crook. I hope that hForth will be useful to many people.