A rationale for Structures in the Forth Scientific Library


This explains the workings and rationale of a scheme for high order data structures. These are the kind of data structures that are roughly equivalent to C/C++ structs or Pascal/Ada records, FORTRAN has no such construct (Fortran 90 does have such a facility, with its type facility).

This version (V1.9) of data structures is based upon ideas described in Hayes (1992), Pountain (1987), Ertl (1994), and Hendrix (1994) and upon much experimentation with several earlier versions of this code.

These data structures allow the creation of several kinds of data that are naturally associated with each other, but do not necessarily work well together as a single array of data.

This implementation of data structures was designed with the following goals in mind:

      1. Portability -- the implementation must be ANS Forth.

      2. Generality -- the data structures need to be used in
                       many kinds of applications.

      3. Readability -- the syntax to be created has to be easy
                        to read and relatively intuitive.

      4. Easy to use -- an awkward design will not likely be used.

      5. Works with multiple instances of a given data structure.

      6. Allows nested data structures.

      7. Allows arrays of data structures (both statically and
	 dynamically delclared) without adding additional syntax
         to existing array words.
The intent of the design is to provide a general, easy to use scheme for handling data only. The temptation to add object oriented properties to the structures was avoided (a possible future data structure could be designed for objects). The structure contains ATTRIBUTE subfields, that is data or pointers to data.

When using a data structure the first step is to declare the type, e.g,

        structure pixel
            integer: ->red
            integer: ->green
            integer: ->blue
            float:   ->intensity
        endstructure
defines a structure named pixel that has 3 integer attribute fields, (named ->red, ->green and ->blue) plus a floating point field (named ->intensity). An alternative form of the type declaration uses the word ATTRIBUTE with the existing type size words,

        structure pixel
            integer attribute  ->red
            integer attribute  ->green
            integer attribute  ->blue
            float   attribute  ->intensity
        endstructure
The words, ->red, ->green, ->blue and ->intensity will be created by these declarations. They will automatically know their offsets into the data structure. Internally the space for the attributes occurs in the order that they are declared. Note that depending upon the data structure being declared, alignment words (ALIGNED, FALIGNED, etc.) may be necessary between the specification of the attributes.

An instance of a pixel is declared the same way as declaring a variable,

pixel pix1

declares a pixel instance called pix1.

The key to this simplicity is the fact that structure is a second order defining word that tells the compiler how a new type is to be defined.

Structures can be nested, for example:

	structure point
		integer: .x
                integer: .y
                sizeof pixel struct: .px
	endstructure

point p

(see below for explanation of sizeof ).

Then the following can be used,

    p .x             \ the address of the .x attribute of p

    p .px ->red      \ the address of the ->red attribute of
                     \ p's .px

USING A STRUCTURE

When a structure name is invoked alone, it stacks a double structure handle. This handle consists of two components each occupying one cell: the base address of the structure instance on the top of the stack, and the type-id for this structure below it on the stack.

The address of an attribute field of a given structure instance is obtained by naming the object and then the attribute, e.g.

pix1 ->red

gets the address of the ->red attribute of pix1. The attribute is late bound to the structure so that the following kinds of operations will work,

        : clear_red ( hndl -- )
                  0 ROT ROT ->red !
        ;

        pix1 clear_red

The structures and attribute fields use the type id's that are part of the structure handle to assure that a member attribute goes with a proper structure. So, for example,

pix1 .x

will give a runtime failure (this happens at runtime not compile time beause the attribute fields use late binding).

Because the structure handles contain the type information, a structure member can be dereferenced well after the structure is referred to. For example,

	p       \ the point structure above

	( .....a bunch of stuff, with no net stack effect,
	        that involves other structures ... )

	.x      \ the user WANTS p .x at this point

This will type of code will work OK.

There is a STRUCT-HANDLE type defined that is useful for storing away structure contexts,

        structure STRUCT-HANDLE
	     1 CELLS attribute .type
             1 CELLS attribute .addr
        endstructure
With this one can store away a structure reference for later use, e.g.,

        STRUCT-HANDLE SELF

        pix1   SELF .addr !   SELF .type !
        
In order to reduce the possiblity of errors when doing this, the words h@ and h! are provided to move values in and out of this structure,

        pix1    SELF h!

        ...

        SELF h@
h@ ( hdl1 -- hdl2 )

Fetches the (double) structure handle stored in a STRUCT-HANDLE instance, hdl1.

h! ( hdl1 hdl2 -- )

Stores the (double) structure handle hdl1 in the STRUCT-HANDLE instance hdl2.


CREATING ARRAYS OF STRUCTURES.

Arrays of structures can use the existing ARRAY / DARRAY mechanism. To do this the new word sizeof is needed,

sizeof ( -- ncells )

This word expects the name of a structure type (not an instance) and returns the number of cells the data part of the structure occupies. For example,

sizeof pixel

and NOT,

sizeof pix1

NOTE: The word sizeof has a side effect of setting the VALUE TYPE-ID to the appropriate value and setting the VALUE STRUCT-ARRAY? to TRUE. These are needed in order to properly construct arrays of structures and nested structures. The result of the sizeof word is used for as a parameter of ARRAY (or DARRAY ) declarations,

5 sizeof pixel ARRAY p{

or,

sizeof pixel DARRAY q{

& q{ 5 }malloc

These create 5 element arrays where each element has the same type as the structure pixel, the names of the arrays are p{ and q{. The address of a particular element is obtained in the usual way, e.g.,

	p{ 2 }            \ stacks the address of element 2


        p{ 2 } ->red      \ stacks the address of the ->red
                          \ attribute of element 2

Defining structures that contain arrays

To define a structure that contain a STATIC array (i.e. fixed in size a compile time), use the word ARRAY: similar to how ARRAY would be used outside of a structure,

        structure databuffer
                  integer: .index
                  20 integer array: .z{
        endstructure
With the above defined, every databuffer instance will have an integer index and a 20 element integer array. The word ARRAY: can be applied to structures, just as the word ARRAY can, to create structures that contain arrays of other structures. To create structures with dynamically allocated arrays, one must use the word POINTER: to create a structure pointer and then use some support words to associate the pointer with the desired array. (The word POINTER: is actually general purpose, it can be used for any type of pointer).

To declare a structure that contains pointers,

         \ a data structure for LU factored matrices
          structure LUMATRIX
                    FLOAT  pointer: .MATRIX{{
                    INTEGER pointer: .PIVOT{
                    INTEGER attribute .N
          endstructure
Then one needs to create a proper matrix and pivot (either statically or dynamically) and then have the LUMATRIX instance point to them,

        INTEGER DARRAY itmp{
        FLOAT   DMATRIX mtmp{{

        LUMATRIX a             \ defines an LU Matrix structure instance

        ...

        & itmp{ 5 }malloc                 \ allocate data space
        & mtmp{{ 5 5 }}malloc

        itmp{ a -> .pivot{ ->!     \ a's pivot now points to itmp{
        mtmp{{ a -> .matrix{{ ->!   \     and to mtmp{{
Once this is done itmp{ and mtmp{{ can be used for other purposes as long as the originally allocated space is not affected.

To free the allocated space, one needs to reverse the process,

        a .pivot{ & itmp{ &!         \ itmp{ now points to the pivot
        & itmp{ }free                \ releases the space

The words used for this are,

-> ( s-id s-addr -- ptr-addr )

Gets the base address of a pointer instance. An example of its usage: buf -> .z{

->! ( addr ptr-addr -- )

Make the pointer (ptr-addr) refer to the array base address (addr - CELL).

IMPORTANT NOTE: If a{ is an array, and B is an instance of a structure that has the pointer attribute .a{, then the sequence,

a{ B -> .a{ ->!

has a ZERO net stack effect if a{ is an array of simple scalar types. BUT has the effect of leaving the TYPE-ID of a{ on the stack if a{ is an array of structures.

The same applies to the reverse sequence that aliases an external array to a pointer in the structure,

B .a{ & a{ &!


Defining CONSTANT structures

To define a constant version of a data structure one needs to have already defined the word that fetches data from the data structure. One has to also define the word that does the equivalent of ',' for that data strucuture (i.e. it allocates space for and initializes a given data structure at HERE). Then the word constant-structure is used in the constant defining word before the pre-existing defining word is invoked,

        : pix-constant
              ['] pix@
              ['] pix,
              constant-structure pixel
        ;
If pix, had the stack diagram ( red green blue -- ) ( F: inten -- ) then a constant pixel could be declared as,

20 30 40 0.75e0 pix-constant PIX

Then subsequent invocations of PIX would cause pix@ to be applied to the data within PIX. (Presumably, pix@ would cause the components of PIX to be placed on the stack. That is the intent of the design, but in actuality one could invoke any word that will expect the base address of the structure on the stack). Remember that the @-like word should consume a double structure handle.

CAUTION: invoking a large constant structure can have a heavy impact on the stack(s). One should take care that there is enough stack space available (particularly the float stack) before using a constant structure.

constant-structure ( '@ ', -- )

This word is used within constant structure defining words. It takes the address of @-like and ,-like words for the data structure and sets up internal pointers to them. It also sets a flag that is used internally within the structure defining word (pixel in the above examples) so that a constant defining data structure is built.


Forcing EARLY BINDING

By default, all the offsets to the individual data elements are resolved at run-time (i.e. LATE binding). Sometimes the binding could be done at compile time (EARLY binding) because all the information to do the binding is available then. If early binding can be done, then the code will run faster. The programmer can inform the compiler that early binding is possible by using the words [[ to start early binding, and ]] to end it.

For example,

	 : test    [[ pix1 ->red ]]  @ . ;

The words [[ and ]] are intended to be used at compile time, they become harmless no-ops at runtime.

The current implementation does not allow for nesting of early bindings.


Limitations to this version of structure

The attribute subfields are globally visible and a collision will occur if two different data structures are designed to have attribute fields with the same name.

Relying on the side effect of sizeof to set the value TYPE-ID, in order to build nested structures is a syntactic weakness of this version. I have yet to figure out a way around it that doesn't result in a loss of all the functionality that structures now have.


Glossary

[[ ( -- )

This word initiates early binding for data structures.

]] ( -- )

This words causes early binding for data structures to end.

-> ( s-id s-addr -- ptr-addr )

Gets the base address of a pointer instance. An example of its usage: buf -> .z{

->! ( addr ptr-addr -- )

Make the pointer (ptr-addr) refer to the array base address (addr minus one CELL).

addrof ( -- addr )

Returns the base address of the data fields of the instance of a data structure that immediately follows this word.

array: ( id offset n cell_size -- offset' )

This word is used for reserving an array that has n elements of size cell_size within the data structure. The name of this field is the name that immediately follows the word. The word array: is used in same manner as the word array so that the array can have either scalar or structure elements.

attribute ( offset size -- offset' )

This word is used for reserving an attribute field of the specified size in a data structure. The name of this field is the name that immediately follows the word.

cell: ( offset -- offset' )

This word is used for reserving an one cell attribute field in a data structure. The name of this field is the name that immediately follows the word.

cells: ( offset n -- offset' )

This word is used for reserving an n cell attribute field in a data structure. The name of this field is the name that immediately follows the word.

char: ( offset -- offset' )

This word is used for reserving an one character attribute field in a data structure. The name of this field is the name that immediately follows the word.

chars: ( offset n -- offset' )

This word is used for reserving an n character attribute field in a data structure. The name of this field is the name that immediately follows the word.

constant-structure ( '@ ', -- )

This word is used within constant structure defining words. It takes the address of @-like and ,-like words for the data structure and sets up internal pointers to them. It also sets a flag that is used internally within the structure defining word (PIXEL in the above examples) so that a CONSTANT defining data structure is built.

double: ( offset -- offset' )

This word is used for reserving an one double cell attribute field in a data structure. The name of this field is the name that immediately follows the word.

endstructure ( id offset -- )

This word is used to complete a structure type definition.

float: ( offset -- offset' )

This word is used for reserving an one float cell attribute field in a data structure. The name of this field is the name that immediately follows the word.

h@ ( hdl1 -- hdl2 )

Fetches the (double) structure handle stored in a STRUCT-HANDLE instance, hdl1.

h! ( hdl1 hdl2 -- )

Stores the (double) structure handle hdl1 in the STRUCT-HANDLE instance hdl2.

integer: ( offset -- offset' )

This word is used for reserving an one integer cell attribute field in a data structure. The name of this field is the name that immediately follows the word. This word is synonomous with cell:.

pointer: ( id offset cell_size -- id offset' )

This word is used for reserving a pointer within the data structure to a data field with cell_size elements. The elements can be either scalar or data structures. The name of this field is the name that immediately follows the word. This word is primarily intended for pointing to dynamically allocated or aliased arrays.

sizeof ( -- ncells )

This immediate word expects the name of a structure type (not an instance) and returns the number of cells the data part of the structure occupies.

NOTE: The word sizeof has a side effect of setting the VALUE TYPE-ID to the appropriate value and setting the VALUE STRUCT-ARRAY? to TRUE. These are needed in order to properly construct arrays of structures and nested structures. The result of the sizeof word is used for the first parameter of ARRAY (or third parameter of DARRAY) declarations.

struct: ( offset size -- offset' )

This word is used for reserving an attribute field for a nested structure that occupies size cells within the data structure. The name of this field is the name that immediately follows the word. This word is synonomous with attribute.

structure ( --<"name">-- id offset )

This word begins the definition of a structure data type (not an instance). This is where the name for the type is defined.

typeof ( -- id )

Returns the type id of the data type that immediately follows this word.


References

Ertl, A., 1994; Usenet comp.lang.forth comments, 28 Oct.

Hayes, J.R., 1992; Objects for Small Systems, Embedded Systems Programming, V. 5, No. 3(March) pp. 32 - 45

Hendrix, M, 1994; Personal communications

Pountain, D., 1987; Object-Oriented Forth, Implementation of Data Structures, Academic Press, New York, 119 pages, ISBN 0-12-563570-2


Structure implementation.
ASCII version of this document.

<-- Back to Forth Scientific Library page.


Dr. Everett (Skip) F. Carter Jr.
Taygeta Scientific Inc.
1340 Munras Ave., Suite 314
Monterey, CA. 93940-6140
voice: 831.641.0645
FAX: 831.641.0647
INTERNET:skip@taygeta.com
WWW:http://www.taygeta.com/
Taygeta's home page