title: STRUCTURE words introduction: The usage of structure definitions is quite commonplace among forth implementors. The actual implementation techniques vary as widely as the resulting capabilities of the resulting structure-"types". During the years, a common minimum characteristics can be deduced from the implementations that manifest themselves in the three words > STRUCTURE ( "structname" -- xx end_offset ) > ENDSTRUCTURE ( xx end_offset -- ) > SIZEOF ( "structname" -- sizeof_it ) short-description: The words STRUCTURE and ENDSTRUCTURE are always used in pairs. In the text between them, the top-of-stack hold a value that is both the sizeof-struct and the current-offset. ENDSTRUCTURE will save the last value in the "structname" parameterpart CREATEd by STRUCTURE, and SIZEOF will fetch the value from the "structname" (so, SIZEOF is state-smart). When the "structname" is executed later it will in turn CREATE a new word - followed by an ALLOT with the size-value saved by ENDSTRUCTURE. The created word behaves just like a VARIABLE. This is called a struct-instance. The access of the various parts of the struct-instance and the modifications to the endoffset-value is very different among implementations, but in general they create offsetwords with a global namespace. intro-example: STRUCTURE newtype 2 CELLS NEWFIELD ->first_2_cells 2 CHARS NEWFIELD ->next_2_chars ENDSTRUCTURE SIZEOF newtype . ( prints probably 10 in a 32-bit forth) 0 ->first_2_cells . ( prints probably 0 ) 0 ->next_2_cells . ( prints probably 8 in a 32-bit forth) newtype myvar ( myvar is otherwise just a VARIABLE) HERE myvar - . ( the sizeof myvar's body is... 10) myvar ->next_2_cells C@ ( to get the first of the 2 chars ) quick-n-dirty-implementation: : STRUCTURE ( "name" -- xx offset ) CREATE HERE ( leave the address of the following sizeof-comma ) 0 DUP , ( initial size is zero and left on the stack ) DOES> ( has the address of the sizeof-comma ) CREATE ( make a variable ) @ ALLOT ( and make the variable that long ) ; : ENDSTRUCTURE ( xx offset -- ) SWAP ! ( store the last endoffset into the sizeof-comma ) ; : SIZEOF ( "name" -- size ) ' >BODY @ ( get the sizeof ... some implementations need also >DOES ) STATE @ IF [COMPILE] LITERAL THEN ; IMMEDIATE : NEWFIELD ( offset field_size "name" -- offset' ) CREATE OVER , ( store the current end_offset ) + ( increase the end_offset by the field_size ) DOES> @ + ( add the memorized offset of the field) ; The generic name for an offset-word is totally different among forth implementations (if it exists anyway), I chose NEWFIELD because it has never been used anywhere before. (see also the example implementation in ./structure.fs ) description: The words STRUCTURE and ENDSTRUCTURE are always used in pairs - ENDSTRUCTURE is supposed to clean up everything that the STRUCTURE word has changed in the environment. A portable script may not make any assumptions about the additional depth of the parameter-stack. The final offset is saved as the size of the struct, but some implementations do also some alignement, either during storage of the value or on instantiation, so that the values do sometimes differ (instead of being the contant 10 in the example). The actual address of the sizevalue inside of the DOES-parameter is not fixed either. Some implemenations put a type-id in there too. A generic NEWFIELD-like word does often not exists because the STRUCTURE fields are only declared with words that do also memorize a type-id to be checked on access to the fields. An example usage would be > STRUCTURE typename > CHAR: ->first_char > CELL: ->probably_aligned_before > CHAR: ->aligned_good_enough > ENDSTRUCTURE The sizeof-value (on a 32bit system) could be 6, 7, 9, 10 or 12. The same applies to the HERE-difference on instantiation of the typename, so you better do not make assumptions if the current structure-implemenation is packed or not. On the other hand, you are free to increase the offset-value at will, which is somehow that same as an ALLOT after a call to CREATE, i.e. "CHAR: ->my_chars 10 CHARS +" is always the same as a "11 CHARARRAY: ->my_chars". This should be widely used to make descriptive names of field by creating new offsetword-declarators, e.g. > : CELLARRAY: >R CELL: R> CELL- + ; > : WINDOWFIELD: 3 CELLARRAY: ; Among the typelike FIELD-declarators you will find BYTE SHORT LONG BYTEARRAY SHORTARRAY LONGARRAY CHAR: WORD: CELL: CHARARRAY CHARARRAY: CHAR-ARRAY CHAR-ARRAY: Among the generic NEWFIELD-declarators you will find FIELD ATTRIBUTE ATTRIBUTE: OFFSETWORD OFFSETWORD: The generic declarator (in a typeless implementation) could be used to make some kind of inheritance and structure-field using: > STRUCTURE a > 2 CELLS FIELD ->a > ENDSTRUCTURE > STRUCTURE b > SIZEOF a FIELD ->a_in_b > CELL FIELD ->b > ENDSTRUCTURE recommendation: In either typeless or typeprone implementations, you are supposed to provide field-declarators for the basic types. Newer implementations chose the ANSI' typenames plus a colon, i.e. you should atleast provide CHAR: and CELL: The Swiftforth ./structs.txt states also INTEGER: and FLOAT: (where they have a word INTEGER that returns the sizeof such a basic type). The arraytypes would be CELLS: and CHARS: instead of old-fashioned CHARARRAY. The generic name field-declarator varies widely and it does even not exist to prevent typeless fields (in that case you could still use "SIZEOF a CHARS: ->a") - SwiftForth uses ATTRIBUTE but struct-fields are declared with STRUCT: in SwithForth, and for the typeid, the SIZEOF has a litte extra sideeffect. The typeid is absolutly important if the implementation wants to integrate such structures with an objectoriented class-system, sometimes therein with multiple inheritance and always with method-invokation, added up even with non-global member-names for classes (not for these structs!). There are a lot of variants, including STRUCTURE: ;STRUCTURE ;ENDSTRUCTURE END_STRUCTURE ADDROF and so on. Proposed for implementation ../../from/Guido.Draheim : STRUCTURE ENDSTRUCTURE SIZEOF ( as above ) CHAR: CHARS: CELL: CELLS: STRUCT: and implementations can chose to defines a generic field-declarator. The terms ATTRIBUTE NEWFIELD FIELD: should be considered reserved for that purpose. Note that users should rarely use an un-typed field and should take the options provided by CELLS: and STRUCT: - otherwise it may fail in different variants of implemenatations that have [DEFINED] TYPE-ID remember that a TYPE-ID implementation could do... : SIZEOF ' >BODY 2@ TYPE-ID ! STATE @ IF [COMPILE] LITERAL THEN ; : STRUCT: DUP >R FIELD: TYPE-ID @ R> >TYPE-ID ! ; and note that type-id-alike implementations are used widely. (see also the example implementation in ./structure.fs ) mpe-forth: ../../from/Guido.Draheim MPE/ProForth seems to use a system that has only offsets available, ie. the usage of the later will simply return the size of the . A special SIZEOF operation is not needed, instead you can simply adjust the defining-offset. > CELL FIELD-TYPE INT ( INT will call CREATE now ) > ( and will add CELL to the offset ) > STRUCT POINT > INT .X > INT .Y > END-STRUCT > > STRUCT RECT > POINT .TOP-LEFT ( that means, STRUCT has just declared ) > POINT .BOTTOM-RIGHT ( another FIELD-TYPE ... ) > END-STRUCT > > RECT BUFFER: NEW-RECT ( outside STRUCT it leaves the offset ) > > CREATE ANOTHER-RECT ( so you could also write ) > RECT ALLOT ( this ) notice that even in this implementation the top-of-stack inside STRUCT...END-STRUCT contains the current offset (a.k.a. current sizeof). williams-variant: ../../from/Guido.Draheim There is another implementation ../../from/David.N.Williams that does only rely on offsetword definitions, see ./qdstruct.fs and ./dlists.fs for an example. Quite interesting case. gforth-variant: ../../system/gforth uses the a "%" at the end of field-declarator (intead of ":") to make it explicit that this field-declarators does need some alignment. Even more, it has interestingly a generic+alignement interpretation but no type-id. note ../../from/Guido.Draheim It should be noted that many implementations have basic alignment words, especially ALIGNED ( x -- x' ) is very useful even in the basic implementations of STRUCTURE, e.g. > STRUCTURE v > CHAR: a > ALIGNED 2 CELLS: b > ENDSTRUCTURE and some self-aligning words can be easily derived : %CELLS: SWAP ALIGNED SWAP CELLS: ; openboot-variant: the common idiom is > STRUCT > CELL FIELD ->A > DCELL FIELD ->B > ENDSTRUCT /MYSTRUCT which is exactly equivalent to > 0 > CELL FIELD ->A > DCELL FIELD ->B > CONSTANT /MYSTRUCT Therefore, the term FIELD should be reserved - it has the simple definition as NEWFILED above. However som implemenations (esp. gforth) have a different bevahiour for FIELD including some alignment info. --- $Id: index-v.txt,v 1.6 2001/08/14 17:58:59 guidod Exp $