Advanced topics
Adding C Library Functions to EDEN
Although the EDEN interpreter has a number of pre-defined functions, such as substr or writeln, and allows the user to define his own functions within the EDEN environment, the user may find restrictions in special purpose applications, such as a graphics application. The user may find some useful functions which are already written in C. Unfortunately, with the current implementation, there are no ways for the EDEN user to call C (or other languages) functions without recompiling the interpreter.
Some efforts had been done to minimise the difficulties of interfacing with C functions so that the user can call a C function directly from the EDEN environment. A built-in EDEN/C interface can bind a C function to an Eden name. However, due to some technical problems and the differences in the method of argument passing, not all C functions can be called by the EDEN interpreter through the EDEN/C interface.
Limitations
- The EDEN/C interface assumes that all C functions return values of integer type. So, the C functions must return integers or characters since C automatically casts characters to integers. Functions which return pointers, including character pointers, are accepted only if the pointers are compatible with integers (i.e. they occupy the same amount of memory). Functions which return floating points or structures are not allowed by the current implementation.
- The interface cannot handle macros (e.g.
getchar()
) unless they are rewritten as true C functions. - Arguments of integer, character, string and pointer type can be passed to the C functions. If the argument is of string type, the character pointer which points to the first character of the string is passed. Arguments cannot contain values of list type. All arguments are cast to integers before they are passed to the C functions. If data types are integer-incompatible, i.e. require a different amount of storage, then they may cause problems. Since the compatibility of data types varies from compiler to compiler, the user should check it out themselves.
- Only the first 10 arguments can be passed (floating point number counts two).
Despite these limitations, the EDEN/C interface can handle a large number
of existing C functions. Sometimes, the user can write a simple C function
to ``bridge'' with a C function excluded by the limitations. For instance,
the user can write a C function, e.g. BridgeFunc
, to interface with the
actual C function, e.g. ActualFunc
, which requires a structure as its
argument:
struct point { int x; int y; }; ActualFunc(struct point p) { ... } BridgeFunc(intx, int y) { struct point p; p.x = x; p.y = y; return ActualFunc(p); }
Then the bridge function is bound instead of the actual function.
To bind a C function
- The user must edit the header file,
$PUBLIC/tkeden/Misc/customlib.h
, to add the C function declarations which must be enclosed in the two lines:#if INCLUDE == 'H' #endif
For example,#if INCLUDE == 'H' extern system(); extern fprintf(); extern char * fgets(); extern double sin(); extern double cos(); #endif
- In the same file,
customlib.h
, add the binding information entry of the form:{ "EDEN_name", (int(*)())C_function_name },
where EDEN_name is a valid Eden identifier and C_function_name is the C function name declared in (1). The two names can be different.
``(int(*)())
'' is used to cast the C function to an integer function. Don't forget the commas at the end.
The binding information must enclosed in#if INCLUDE == 'T' #endif
For example:#if INCLUDE == 'T' { "system", system }, { "fprintf", fprintf }, { "fgets", (int(*)())fgets }, /* ... etc ... */ #endif
If the function returns a double precision floating point, the binding information shall be enclosed in#if INCLUDE == 'R' #endif
For example:#if INCLUDE == 'R' { "sin", sin }, { "cos", cos }, /* ... etc ... */ #endif
- Recompile EDEN following instructions in the file
$PUBLIC/tkeden/INSTALL
. Don't forget to add the links inImakefile
(orMakefile
as appropriate) if the new functions added require links to object libraries other than what have already included.
A set of macros is defined, at the beginning of customlib.h
, to reduce
the complexity of binding C functions to Eden names. These macros are
SameFunc
, Function
, SameReal
,
RealFunc
, and SpecialF
:
SameFunc(
Name)
- Bind a C function named as Name to the Eden name identical to Name. The value returned by the C function must be of integer type.
Function(
Ename,Cname)
- Bind a C function named as Cname to the Eden name Ename. The value returned by the C function must be of integer type.
SpecialF(
Ename,Type,Cname)
- Bind a C function named as Cname to the Eden name Ename. Type specifies the type of value returned by the C function.
RealFunc(
Ename,Cname)
- Bind a C function named as Cname to the Eden name Ename. The C function must be a double precision floating point function.
SameReal(
Name)
- Bind a C function named as Name to the Eden name identical to Name. The C function must be a double precision floating point function.
These macros reduces the two-part information into a single macro. To
specify a binding, just append the macros at the end of customlib.h
. For
example:
/* customlib.h */ ... SameFunc(system) SameFunc(fprintf) SpecialF(fopen,FILE *,fopen) Function(getchar,my_getchar) RealFunc(sin,sin) SameReal(cos) ...
Warning
No white space is allowed in the macros (except in Type
), and
no other punctuations (including commas and semi-colons, but excluding
comments) can follow these macros.
Macros for Binding C Functions to Eden
Note that these macros only work with ANSI C.
#if INCLUDE == 'H' #define SameFunc(Name) extern Name(); #define Function(Ename,Cname) extern Cname(); #define SpecialF(Ename,Type,Cname) extern Type Cname(); #define RealFunc(Ename,Cname) extern double Cname(); #define SameReal(Name) extern double Name(); #endif #if INCLUDE == 'T' #define SameFunc(Name) {#Name,Name}, #define Function(Ename,Cname) {#Ename,Cname}, #define SpecialF(Ename,Type,Cname){#Ename,(int(*)())Cname}, #define RealFunc(Ename,Cname) #define SameReal(Name) #endif #if INCLUDE == 'R' #define SameFunc(Name) #define Function(Ename,Cname) #define SpecialF(Ename,Type,Cname) #define RealFunc(Ename,Cname) {#Ename,Cname}, #define SameReal(Name) {#Name,Name}, #endif
Programming Notes
Memory Allocation
Memory allocation is automatically handled by the EDEN interpreter. The string and list data types are assigned by copy (i.e. each string has its own copy instead of sharing the same string through pointers as in C). For example,
S1 = "1234567890"; S2 = S1;
S2
is now the string "1234567890"
. If we now do
S2[1] = "X";
then S2
becomes "X234567890"
, but
S1
is still "1234567890"
.
Also the EDEN interpreter allocates the exact amount of memory for the string. Reference beyond the string's or list's memory block (probably through C function calls) may have unpredictable results.
If we pass a string as an argument to an Eden function, the string is copied, i.e. passed by value. But if we pass a string to a C function, only the address of the memory is passed, i.e. passed by reference. So, the user should be careful in calling C functions.
For instance, if we want to use the C function gets
to read in a string
from stdin, we must first allocate some memory to a string variable before
we call gets
:
S = substr("", 1, 255); gets(S);
Here we call the function substr
to assign 256 spaces (strings are terminated
by an invisible NUL character) to S
.
Then we call gets to read a string
into the memory allocated to S
. By doing so, we not only allocate memory
for gets
to used, but also set the data type of S
to the string type.
Stack, Heap and Frame Stack
The pseduo machine of the EDEN interpreter is a simple stack machine.
Stack
A stack holds the values used and computed during evaluation of an Eden program.
When the machine calls a function, the parameters are pushed on the top of the stack. Actually, the parameters are merged to form a single list datum. If the function requires any local variable, the machine allocates space for their values. When the function returns, the allocated local variables and the arguments stored on the stack are poped; then the value is returned by pushing it onto the stack. See the figure below which shows the stack
- before a function is called,
- during a function call, the arguments having been pushed onto the stack, and
- after the function has returned, the arguments and local variables having been popped, and the answer pushed onto the stack.
Since the stack is a fixed size memory block, a deep function call may cause the ``stack overflow'' error.
Heap
To reduce the frequency of copying the contents of strings and lists,
we add another data structure, called the heap, to hold the contents of
temporary strings or lists during the string/list operations. The overhead
for allocating/deallocating memory on the heap is less than that of
malloc()
/free()
.
The heap is a large memory block. Memory is allocated within this block. A pointer keeps track the top of heap similar to the stack pointer of the stack. The pseduo machine instruction, freeheap, releases all the memory allocated by previous computation. The heap is not affected by the return of the function call. Freeheap instructions are automatically inserted by the interpreter at the points that it thinks safe to free memory. However, if a computation involves long strings, lists or too deep function called, the machine may not have the chance to free the memory and thus causes the ``heap overflow'' error.
The user may have to modify their program to minimise the load of stack and heap to avoid memory overflow errors. For instance, an iterative function has less demand on stack and heap than its recursive equivalent.
Frame Stack
In returning from a function call, the machine has to stored the previous machine status, e.g. stack pointer, (pseduo machine) program counter, number of local variables (since local variables were put on the stack), etc. A frame stack holds these information.
When a function is called, the current machine status, a frame, is pushed
on the frame stack. For example,
if f1()
calls f2()
, the status of stack,
heap and frame stack is shown below.
The frame stack keeps track of the stack and heap pointers.
A freeheap instruction (generated by EDEN) causes the
top-of-heap to return to the bottom pointed to by the top-most
frame (e.g. f2
). On the other hand, the stack pointer is lowered
only if the function returns.
A frame is popped when a function returns and the machine status resumes to the previous state. Since the frame stack is implemented as a small stack, about 64 entries, the number of level of nested function call is 64 approx. This maximum limit is enough for general use, but if a function nested too deep, the machine will generate a ``nested too deep'' error.
Avoiding Memory Overflow
Since the EDEN interpreter passes list "by value", i.e. copies the entire list on heap, it is very eay to overflow the heap if you pass long lists as parameters of a recursive function. To get round this problem, you may consider the use of iterative loops instead of recursive functions, or you may pass the "pointer" of that list to the function, and let the function to copy that list to its local variable. For example, instead of:
func foo /* (list) */ { ... foo($1); ... }
do:
func foo /* (list *) */ { auto list; list = *$1; /* copy the list to local variable */ ... list = ...; /* do the calculation here */ foo(&list); /* pass the pointer of that list */ ... }
Also not that the interpreter will try to free the heap space at the end of each statement. Therefore, it would be helpful space-wise to break a long expression into smaller expressions and use temperary variables to hold intermediate computation results. For instance:
a = expression; foo(a);
uses less heap space than
foo(expression);
The machine uses malloc()
to allocate space for strings and lists rather
than claiming space from the heap. So the use of strings and lists would
not normally causes memory overflow.
Autocalc
The autocalc
variable controls the evaluation scheme of the EDEN interpreter.
When it is 1 (default) or non-zero, the interpreter will try to evaluate
all definitions (if they are evaluable, i.e. all their source variables
are defined); otherwise (autocalc is zero) the interpreter puts the suspended
definitions in a queue.
Before the interpreter evaluates a definition, it tests whether all its source variables are defined, i.e. their UpToDate flags are true. However, the interpreter is not intelligent enough to identify variables referenced by a definition indirectly. For instance,
func F { para N; return N + Z; } Y is F(1);
Function F
references the (global) variable Z
,
but the interpreter does not know about that. When Y
is defined, the interpreter thinks that Y
depends on
F
only, and proceeds to evaluate Y
and thus
evaluate F
. However, F
finds that Z
is undefined. In this example, F
returns the sum of
Z
and the first parameter which produces @
when Z
is @
However, some operators and most of
the statements produce an error when the expected data types are not met.
Of course, if we defined Z
before the introduction of
Y
, it may not cause an error. However, if we introduce
Z
after that of Y
, we are in the risk.
Hence, the result of computation seems to be sensitive to the order of
definitions. Despite of this ordering problem, redefining Z
would not cause Y
to be updated. This may not be the
intended behaviour sometimes. There are two ways to get around it:
- Set
autocalc
to0
before the introduction of all definitions. After all definitions are defined, setautocalc
to1
again. By doing so, the evaluation of definitions is delayed. For instance,autocalc = 0; /* ... other statements ... */ func F { para N; return N + Z; } Y is F(1); Z is 10; /* ... other statements ... */ autocalc = 1;
However, this method does not solve the re-evaluation problem. - Explicitly give the dependency of
Z
toF
. For example,func F { para N; return N + Z; } proc touch_F : Z { touch(&F); } Y is F(1); Z is 10;
Because now change ofZ
will induce a change toF
,Y
is indirectly dependent uponZ
, andY
can't be evaluated untilZ
is defined. Also Y will be re-evaluated whenZ
is re-defined. This point is not true for the former solution. In fact, the interpreter should generate such dependency for function specifications.
Note that such attempts as:func F: Z { para N; return N + Z; }
orfunc F { para N; return N + Z; } Z ~> [F]
to impose dependency would not work because in both cases a change ofZ
will call the functionF
(as if it is an action). This is an error because a parameter is needed to calculateF
.