Advanced topics

Adding C Library Functions to EDEN

Although the EDEN interpreter has a number of pre-defined functions, such as substr or writeln, and allows the user to define his own functions within the EDEN environment, the user may find restrictions in special purpose applications, such as a graphics application. The user may find some useful functions which are already written in C. Unfortunately, with the current implementation, there are no ways for the EDEN user to call C (or other languages) functions without recompiling the interpreter.

Some efforts had been done to minimise the difficulties of interfacing with C functions so that the user can call a C function directly from the EDEN environment. A built-in EDEN/C interface can bind a C function to an Eden name. However, due to some technical problems and the differences in the method of argument passing, not all C functions can be called by the EDEN interpreter through the EDEN/C interface.

Limitations

The EDEN/C interface assumes that all C functions return values of integer type. So, the C functions must return integers or characters since C automatically casts characters to integers. Functions which return pointers, including character pointers, are accepted only if the pointers are compatible with integers (i.e. they occupy the same amount of memory). Functions which return floating points or structures are not allowed by the current implementation.
The interface cannot handle macros (e.g. getchar()) unless they are rewritten as true C functions.
Arguments of integer, character, string and pointer type can be passed to the C functions. If the argument is of string type, the character pointer which points to the first character of the string is passed. Arguments cannot contain values of list type. All arguments are cast to integers before they are passed to the C functions. If data types are integer-incompatible, i.e. require a different amount of storage, then they may cause problems. Since the compatibility of data types varies from compiler to compiler, the user should check it out themselves.
Only the first 10 arguments can be passed (floating point number counts two).

Despite these limitations, the EDEN/C interface can handle a large number of existing C functions. Sometimes, the user can write a simple C function to ``bridge'' with a C function excluded by the limitations. For instance, the user can write a C function, e.g. BridgeFunc, to interface with the actual C function, e.g. ActualFunc, which requires a structure as its argument:

struct point {
        int x;
        int y;
        };

ActualFunc(struct point p)
{ ... }

BridgeFunc(intx, int y) {
        struct point p;
        p.x = x;
        p.y = y;
        return ActualFunc(p);
        }

Then the bridge function is bound instead of the actual function.

To bind a C function

The user must edit the header file, $PUBLIC/tkeden/Misc/customlib.h, to add the C function declarations which must be enclosed in the two lines:

#if INCLUDE == 'H'
#endif

For example,

#if INCLUDE == 'H'
        extern system();
        extern fprintf();
        extern char * fgets();
        extern double sin();
        extern double cos();
#endif

In the same file, customlib.h, add the binding information entry of the form:
```
{ "EDEN_name", (int(*)())C_function_name },
```
where EDEN_name is a valid Eden identifier and C_function_name is the C function name declared in (1). The two names can be different.
``(int(*)())'' is used to cast the C function to an integer function. Don't forget the commas at the end.
The binding information must enclosed in
```
#if INCLUDE == 'T'
#endif
```
For example:
```
#if INCLUDE == 'T'
        { "system", system },
        { "fprintf", fprintf },
        { "fgets", (int(*)())fgets },
        /* ... etc ... */
#endif
```
If the function returns a double precision floating point, the binding information shall be enclosed in
```
#if INCLUDE == 'R'
#endif
```
For example:
```
#if INCLUDE == 'R'
        { "sin", sin },
        { "cos", cos },
        /* ... etc ... */
#endif
```
Recompile EDEN following instructions in the file $PUBLIC/tkeden/INSTALL. Don't forget to add the links in Imakefile (or Makefile as appropriate) if the new functions added require links to object libraries other than what have already included.

A set of macros is defined, at the beginning of customlib.h, to reduce the complexity of binding C functions to Eden names. These macros are SameFunc, Function, SameReal, RealFunc, and SpecialF:

SameFunc(Name): Bind a C function named as Name to the Eden name identical to Name. The value returned by the C function must be of integer type.
Function(Ename,Cname): Bind a C function named as Cname to the Eden name Ename. The value returned by the C function must be of integer type.
SpecialF(Ename,Type,Cname): Bind a C function named as Cname to the Eden name Ename. Type specifies the type of value returned by the C function.
RealFunc(Ename,Cname): Bind a C function named as Cname to the Eden name Ename. The C function must be a double precision floating point function.
SameReal(Name): Bind a C function named as Name to the Eden name identical to Name. The C function must be a double precision floating point function.

These macros reduces the two-part information into a single macro. To specify a binding, just append the macros at the end of customlib.h. For example:

/* customlib.h */
...
SameFunc(system)
SameFunc(fprintf)
SpecialF(fopen,FILE *,fopen)
Function(getchar,my_getchar)
RealFunc(sin,sin)
SameReal(cos)
...

Warning

No white space is allowed in the macros (except in Type), and no other punctuations (including commas and semi-colons, but excluding comments) can follow these macros.

Macros for Binding C Functions to Eden

Note that these macros only work with ANSI C.

#if INCLUDE == 'H'
#define SameFunc(Name)             extern Name();
#define Function(Ename,Cname)      extern Cname();
#define SpecialF(Ename,Type,Cname) extern Type Cname();
#define RealFunc(Ename,Cname)      extern double Cname();
#define SameReal(Name)             extern double Name();
#endif

#if INCLUDE == 'T'
#define SameFunc(Name)            {#Name,Name},
#define Function(Ename,Cname)     {#Ename,Cname},
#define SpecialF(Ename,Type,Cname){#Ename,(int(*)())Cname},
#define RealFunc(Ename,Cname)
#define SameReal(Name)
#endif

#if INCLUDE == 'R'
#define SameFunc(Name)
#define Function(Ename,Cname)
#define SpecialF(Ename,Type,Cname)
#define RealFunc(Ename,Cname)     {#Ename,Cname},
#define SameReal(Name)            {#Name,Name},
#endif

Programming Notes

Memory Allocation

Memory allocation is automatically handled by the EDEN interpreter. The string and list data types are assigned by copy (i.e. each string has its own copy instead of sharing the same string through pointers as in C). For example,

S1 = "1234567890";
S2 = S1;

S2 is now the string "1234567890". If we now do

S2[1] = "X";

then S2 becomes "X234567890", but S1 is still "1234567890".

Also the EDEN interpreter allocates the exact amount of memory for the string. Reference beyond the string's or list's memory block (probably through C function calls) may have unpredictable results.

If we pass a string as an argument to an Eden function, the string is copied, i.e. passed by value. But if we pass a string to a C function, only the address of the memory is passed, i.e. passed by reference. So, the user should be careful in calling C functions.

For instance, if we want to use the C function gets to read in a string from stdin, we must first allocate some memory to a string variable before we call gets:

S = substr("", 1, 255);
gets(S);

Here we call the function substr to assign 256 spaces (strings are terminated by an invisible NUL character) to S. Then we call gets to read a string into the memory allocated to S. By doing so, we not only allocate memory for gets to used, but also set the data type of S to the string type.

Stack, Heap and Frame Stack

The pseduo machine of the EDEN interpreter is a simple stack machine.

Stack

A stack holds the values used and computed during evaluation of an Eden program.

When the machine calls a function, the parameters are pushed on the top of the stack. Actually, the parameters are merged to form a single list datum. If the function requires any local variable, the machine allocates space for their values. When the function returns, the allocated local variables and the arguments stored on the stack are poped; then the value is returned by pushing it onto the stack. See the figure below which shows the stack

before a function is called,
during a function call, the arguments having been pushed onto the stack, and
after the function has returned, the arguments and local variables having been popped, and the answer pushed onto the stack.

[DIAGRAM]

Since the stack is a fixed size memory block, a deep function call may cause the ``stack overflow'' error.

Heap

To reduce the frequency of copying the contents of strings and lists, we add another data structure, called the heap, to hold the contents of temporary strings or lists during the string/list operations. The overhead for allocating/deallocating memory on the heap is less than that of malloc()/free().

The heap is a large memory block. Memory is allocated within this block. A pointer keeps track the top of heap similar to the stack pointer of the stack. The pseduo machine instruction, freeheap, releases all the memory allocated by previous computation. The heap is not affected by the return of the function call. Freeheap instructions are automatically inserted by the interpreter at the points that it thinks safe to free memory. However, if a computation involves long strings, lists or too deep function called, the machine may not have the chance to free the memory and thus causes the ``heap overflow'' error.

The user may have to modify their program to minimise the load of stack and heap to avoid memory overflow errors. For instance, an iterative function has less demand on stack and heap than its recursive equivalent.

Frame Stack

In returning from a function call, the machine has to stored the previous machine status, e.g. stack pointer, (pseduo machine) program counter, number of local variables (since local variables were put on the stack), etc. A frame stack holds these information.

When a function is called, the current machine status, a frame, is pushed on the frame stack. For example, if f1() calls f2(), the status of stack, heap and frame stack is shown below.

[DIAGRAM]

The frame stack keeps track of the stack and heap pointers. A freeheap instruction (generated by EDEN) causes the top-of-heap to return to the bottom pointed to by the top-most frame (e.g. f2). On the other hand, the stack pointer is lowered only if the function returns.

A frame is popped when a function returns and the machine status resumes to the previous state. Since the frame stack is implemented as a small stack, about 64 entries, the number of level of nested function call is 64 approx. This maximum limit is enough for general use, but if a function nested too deep, the machine will generate a ``nested too deep'' error.

Avoiding Memory Overflow

Since the EDEN interpreter passes list "by value", i.e. copies the entire list on heap, it is very eay to overflow the heap if you pass long lists as parameters of a recursive function. To get round this problem, you may consider the use of iterative loops instead of recursive functions, or you may pass the "pointer" of that list to the function, and let the function to copy that list to its local variable. For example, instead of:

func foo /* (list) */
{
        ...
        foo($1);
        ...
}

do:

func foo /* (list *) */
{
        auto list;
        list = *$1;     /* copy the list to local variable */
        ...
        list = ...;     /* do the calculation here */
        foo(&list); /* pass the pointer of that list */
        ...
}

Also not that the interpreter will try to free the heap space at the end of each statement. Therefore, it would be helpful space-wise to break a long expression into smaller expressions and use temperary variables to hold intermediate computation results. For instance:

a = expression;
foo(a);

uses less heap space than

foo(expression);

The machine uses malloc() to allocate space for strings and lists rather than claiming space from the heap. So the use of strings and lists would not normally causes memory overflow.

Autocalc

The autocalc variable controls the evaluation scheme of the EDEN interpreter. When it is 1 (default) or non-zero, the interpreter will try to evaluate all definitions (if they are evaluable, i.e. all their source variables are defined); otherwise (autocalc is zero) the interpreter puts the suspended definitions in a queue.

Before the interpreter evaluates a definition, it tests whether all its source variables are defined, i.e. their UpToDate flags are true. However, the interpreter is not intelligent enough to identify variables referenced by a definition indirectly. For instance,

func F { para N; return N + Z; }
Y is F(1);

Function F references the (global) variable Z, but the interpreter does not know about that. When Y is defined, the interpreter thinks that Y depends on F only, and proceeds to evaluate Y and thus evaluate F. However, F finds that Z is undefined. In this example, F returns the sum of Z and the first parameter which produces @ when Z is @ However, some operators and most of the statements produce an error when the expected data types are not met.

Of course, if we defined Z before the introduction of Y, it may not cause an error. However, if we introduce Z after that of Y, we are in the risk. Hence, the result of computation seems to be sensitive to the order of definitions. Despite of this ordering problem, redefining Z would not cause Y to be updated. This may not be the intended behaviour sometimes. There are two ways to get around it:

Set autocalc to 0 before the introduction of all definitions. After all definitions are defined, set autocalc to 1 again. By doing so, the evaluation of definitions is delayed. For instance,
```
autocalc = 0;
/* ... other statements ... */
func F { para N; return N + Z; }
Y is F(1);
Z is 10;
/* ... other statements ... */
autocalc = 1;
```
However, this method does not solve the re-evaluation problem.
Explicitly give the dependency of Z to F. For example,
```
func F { para N; return N + Z; }
proc touch_F : Z { touch(&F); }
Y is F(1);
Z is 10;
```
Because now change of Z will induce a change to F, Y is indirectly dependent upon Z, and Y can't be evaluated until Z is defined. Also Y will be re-evaluated when Z is re-defined. This point is not true for the former solution. In fact, the interpreter should generate such dependency for function specifications.
Note that such attempts as:
```
func F: Z { para N; return N + Z; }
```
or
```
func F { para N; return N + Z; }
Z ~> [F]
```
to impose dependency would not work because in both cases a change of Z will call the function F (as if it is an action). This is an error because a parameter is needed to calculate F.