line

DISI Dipartimento di Informatica e Scienze dell'Informazione

line



GAMMA Project: Genoa Active Message MAchine





The Application Programming Interface (API) of GAMMA


The GAMMA communication library provides functions for process grouping, point-to-point communication, and collective communications at the application level. Both C and FORTRAN calls are provided. Here we describe only the C interface.

This is a list of the GAMMA library functions and variables.

Initiate/terminate parallel section of a job:
gamma_init() gamma_exit()

Set up communication ports:
gamma_set_active_port() gamma_set_passive_port()
gamma_post_recv() gamma_attach_buffer()

Send routines, blocking:
gamma_send() gamma_send_flowctl()
gamma_send_2p() gamma_send_2p_flowctl()

Send routines, non-blocking:
gamma_isend() gamma_isend_flowctl()
gamma_isend_2p() gamma_isend_2p_flowctl()
gamma_wsend() gamma_tsend()

Synchronize on message arrivals:
gamma_signal() gamma_sigerr()
gamma_wait() gamma_test()

Miscellaneous:
gamma_atomic() gamma_sync()
gamma_my_par_pid() gamma_my_node()
gamma_how_many_nodes() gamma_mlock()
gamma_munlock() gamma_munlockall()
gamma_time() gamma_time_diff()
gamma_active_port gamma_errno

GAMMA functions are built on top of a small set of custom system calls, activated using the trap address 0x81, which traps down to kernel in the GAMMA device driver through a short and fast code path.

Each library function, with the exception of gamma_time() and gamma_time_diff(), returns a negative integer value in case of error, and a non-negative integer value in case of successful completion.

The programming interface is currently defined as follows:





int gamma_init (
	int num_nodes,
	int argc,
	char **argv
);
A parallel computation is started.

As a sequential user process P invokes it, a process group called virtual GAMMA is activated. The group is composed of process P, running on the local workstation, plus additional num_nodes-1 processes identical to P launched on num_nodes-1 distinct remote workstations (chosen by those one found in file /etc/gamma.conf) via ``rsh'' command.

Hence after having invoked gamma_init() the invoking user process P is replicated on num_nodes workstations in the cluster, thus forming a running SPMD parallel application.

The process replicas themselves eventually invoke gamma_init(), but this time the effect is that of registering themselves with the created group, without creating new ones.

A positive number called ``parallel pid'' uniquely identifies the newly created process group in the cluster.

Note that nothing prevents two independent user processes P and Q to invoke gamma_init() separately from one another. This will result into the creation of two distinct GAMMA process groups in the same cluster, each with a distinct ``parallel pid''. The two groups may share some or even all the available workstations in the cluster, but cannot share processes.

Currently invoking gamma_init() with num_nodes less than or equal to zero or greater than the total number of workstations connected to the cluster has the same effect as num_nodes were equal to the total number of workstations connected in the cluster.

int gamma_exit (void);
The invoking process terminates the parallel computation, exiting from its process group. The process who created the group (and got instance number 0) will destroy the group as soon as every other process instance has left the group.




int gamma_set_active_port (
        unsigned char port,
        unsigned char dest_node,
        unsigned char dest_par_pid,
        unsigned char dest_port,
        void (*receiver_handler)(void),
	void (*error_handler)(void),
	unsigned char semaphore,
	unsigned char buffer_kind,
        void *destination_buffer,
	unsigned long buffer_len
);
int gamma_set_passive_port (
        unsigned char port,
        unsigned char dest_node,
        unsigned char dest_par_pid,
        unsigned char dest_port,
	unsigned char semaphore,
	unsigned char buffer_kind,
        void *destination_buffer,
	unsigned long buffer_len
);
Activation of one out of the 1025 bidirectional communication ports of the calling process, numbered from 0 to 1024. Ports 1023 and 1024 are currently reserved to GAMMA collective routines (broadcast and barrier synchronization respectively). The calling process must have previously invoked gamma_init().

The communication port may be programmed for output, input, or both.

An output port must be bound to an input port of a remote receiver process which outgoing messages are to be delivered to. Such remote port is fully specified by the triple dest_node (instance number of the receiver process), dest_par_pid (``parallel pid'' of the process group which the receiver process belongs to), and dest_port (a specific input port of the receiver process). Note that inter-group communication is allowed.It is not allowed for a process to connect a port to itself for output.

Parameter dest_node may be set to the constant BROADCAST. In this case, each message transmitted through the port will be broadcast to each process in the group specified by dest_par_pid (excluding the sender itself). Each receiver process will get the message through its local port specified by dest_port.

An input port must be bound to a destination buffer, a notification semaphore, and a receiver and error handlers (active ports only).

The destination buffer is a contiguous virtual memory region in application space; its size in bytes is specified by buffer_len. Any non-empty message arriving to the port will be stored in such buffer. Specifying a destination buffer is mandatory only if non-empty messages are to be received.

Many common data structures (for instance, arrays) span contiguous regions in virtual memory space, therefore in most cases there is no need of providing separate buffers for incoming messages.

Currently the destination buffer must have been locked and pre-fetched into physical RAM before use, by invoking the gamma_mlock() function.

If the current message fits the destination buffer exactly, the next message hitting the same port will be stored at the beginning of the same buffer, thus overwriting the current one (unless the port has been bound to a different destination buffer meanwhile).

If a message arrives which is larger than the destination buffer, then the message is truncated to fit the buffer and the error handler (see below) is executed raising an error condition (see gamma_errno).

If the current message is shorter than the destination buffer, and the port has not been bound to a different destination buffer before a new message hits the port, then the next message will be stored in the same destination buffer; either contiguous next to the previous message (in case buffer_kind is set to GO_AHEAD), or at the beginning of the buffer itself (in case buffer_kind is set to GO_BACK) The former mode helps building gather-like communication patterns; in such a case, however, if the new message is larger than the remaining room in the destination buffer, it is truncated to fit the buffer and the error handler (see below) is executed raising an error condition (see gamma_errno).

The receiver handler is an application-defined function, which will be executed each time a new message hits a port, provided the port has been set up by invoking the gamma_set_active_port() routine. Empty messages hitting the port will trigger the receiver handler as well.

The receiver handler will run after the message body (if any) has been copied to the destination buffer (if any). New messages hitting the port will not be stored into the destination buffer before the receiver handler has run to completion.

The receiver handler is launched by the GAMMA driver, immediately after the last chunk of message has been copied to the destination buffer. Currently receiver handlers run with interrupts disabled and at kernel privilege level. This rules out the possibility of invoking standard system calls from receiver handlers. Also, this imposes the constraint that every data structure referred to by the handler must have been previously locked and pre-fetched into physical RAM (see function gamma_mlock()). A receiver handler may invoke any GAMMA call.

The error handler is similar to a receiver handler but is run each time an anomalous condition is detected at the port.

In order to allow a receiver process to synchronize to input events (message arrivals, handlers activities) in a safe way, GAMMA provides 1025 per-process notification semaphores numbered from 0 to 1024. Semaphores 1023 and 1024 are reserved to GAMMA collective routines (broadcast and barrier synchronization respectively). Each port being used for input must be associated to one such semaphore. Each time a message hits the port, its semaphore get incremented by one. Additionally, receiver and error handlers may also increment other semaphores, if programmed to do so, by invoking gamma_signal(). A receiver process can wait upon message arrivals or handlers activities by invoking gamma_wait() or gamma_test(). Semaphores are initialized to zero by gamma_init().

Recall that any GAMMA port can be programmed to be output and input simultaneously, provided the correct parameters are passed to the gamma_set_active_port() gamma_set_passive_port() routines. The actual use of a port as an input or output one depends on its use by the application.





int gamma_post_recv (
	unsigned char input_port,
	void *destination_buffer,
	unsigned long buffer_len
);
int gamma_attach_buffer (
	unsigned char input_port,
	void *destination_buffer
);
Port specified by input_port is bound to the specified destination buffer. The next message hitting the port will be stored into such buffer . The buffer is required to span a contiguous region of virtual memory. Many data structure of common use (arrays, for instance) fulfill such requirement. Moreover the buffer must have been previously locked and pre-fetched into physical RAM (see function gamma_mlock()).

These are low-overhead alternatives to the gamma_set_active_port() and gamma_set_active_port() functions. They don't require invoking any system call, as the buffer address and size are actually kept in the user data segment. Their intended use is within receiver handlers, in order to prepare a fresh application-space buffer for incoming messages after having consumed the previous one. With gamma_attach_buffer(), the size of the destination buffer remains the same as set up by the last invocation of gamma_post_recv() or gamma_set_active_port() or gamma_set_active_port() routines.





int gamma_send (
	unsigned char output_port,
	void *data,
	unsigned long len
);
int gamma_send_flowctl (
	unsigned char output_port,
	void *data,
	unsigned long len
);
A message is sent through the port specified by output_port with blocking semantics. The output port is supposed to have previoulsy been bound to a remote destination by the gamma_set_active_port() or gamma_set_active_port() functions. The message is supposed to be stored in the user space at the address data and counts len bytes.
int gamma_send_2p (
	unsigned char output_port,
	void *data1,
	unsigned long len1,
	void *data2,
	unsigned long len2,
);
int gamma_send_2p_flowctl (
	unsigned char output_port,
	void *data1,
	unsigned long len1,
	void *data2,
	unsigned long len2,
);
A message is sent through the port specified by output_port with blocking semantics. The output port is supposed to have previoulsy been bound to a remote destination by the gamma_set_active_port() or gamma_set_active_port() functions. The message is composed by two ``pieces'', possibly stored into two distinct memory regions in the user space, and of possibly different size (2-way gather). The first ``piece'' (specified by data1 and len1 is not allowed to be larger than 20 bytes. These routines are intended as support for MPI/GAMMA.




int gamma_isend (
	unsigned char output_port,
	void *data,
	unsigned long len
);
int gamma_isend_flowctl (
	unsigned char output_port,
	void *data,
	unsigned long len
);
int gamma_isend_2p (
	unsigned char output_port,
	void *data1,
	unsigned long len1,
	void *data2,
	unsigned long len2,
);
int gamma_isend_2p_flowctl (
	unsigned char output_port,
	void *data1,
	unsigned long len1,
	void *data2,
	unsigned long len2,
);
Respectively similar to the gamma_send(), gamma_send_flowctl , gamma_send_2p, gamma_send_2p_flowctl , but with non-blocking semantics. A ``handle'' is returned, which unambiguously identifies the initiated send operation and can be used to wait/test for its completion (see gamma_wsend(), gamma_tsend())
int gamma_wsend (
	unsigned long handle
);
int gamma_tsend (
	unsigned long handle
);
Wait/test for completion of non-blocking send operations initiated by gamma_isend(), gamma_isend_flowctl gamma_isend_2p, gamma_isend_2p_flowctl routines. Function gamma_wsend() blocks until the send operation specified by handle completes. Function gamma_tsend returns 1 if the send operation specified by handle has completed, otherwise 0.




int gamma_signal (
	unsigned char sem
);
In order to allow a receiver process to cooperate and synchronize with receiver handlers in a safe way, GAMMA provides 1025 per-process semaphores numbered from 0 to 1024. Semaphores 1023 and 1024 are reserved to GAMMA collective routines (broadcast and barrier synchronization respectively).

Semaphores are initialized to zero by gamma_init().

gamma_signal(sem) causes semaphore sem to be atomically incremented by one.

Typically such function is issued by a receiver handler in order to notify the arrival of a message to the main thread of the receiver process.

int gamma_sigerr (
	unsigned char sem
);
In order to allow a receiver process to cooperate and synchronize with error handlers in a safe way, GAMMA provides 1025 per-process error semaphores numbered from 0 to 1024. Semaphores 1023 and 1024 are reserved to GAMMA collective routines (broadcast and barrier synchronization respectively).

Error semaphores are initialized to zero by gamma_init().

gamma_sigerr(sem) causes error semaphore sem to be atomically incremented by one.

Typically such function is issued by an error handler in order to notify the arrival of a corrupted message or the failure of a send operation to the main thread of a process.

int gamma_wait (
	unsigned char sem,
	unsigned long n
);
The invoking process busy-waits until semaphore sem raises value n. Semaphore sem is atomically decremented by n upon return.

Typically such function is invoked by a process waiting for message arrivals. Semaphore sem is typically incremented by some receiver handler issuing gamma_signal(). During the busy-waiting the NIC is polled for incoming frames so as to speed up message arrivals by avoiding IRQ overheads. However this is only an optimization, which does not change the semantics.

On return, gamma_wait() yields zero if no receive errors were encountered, otherwise it yields a negative number whose absolute value is the count of how many times the function gamma_sigerr has been issued on error semaphore sem since last run of gamma_wait.

int gamma_test (
	unsigned char sem
);
Returns the current value of semaphore sem. The value of sem is left unchanged.




int gamma_atomic (
	void (*funct)(void)
);
Function funct is executed atomically, that is it will not be interrupted neither by the scheduler nor by any receiver or error handler. This allows for any function of the user program to be issued safely in case it shares data structures with receiver/error handlers. Currently each function to be executed atomically is constrained to the same restrictions as receive and error handlers are.




int gamma_sync (void);
Barrier synchronization among all processes within a process group. After calling gamma_sync(), the caller process resumes execution successfully (that is, without error code) only when all other processes in the same group of the caller have reached the gamma_sync() function.

Exploiting a 2 tokens synchronization mechanism, the GAMMA implementation of this collective communication primitive achieves best performance over shared Fast Ethernet channels.





int gamma_my_par_pid (void);
Returns the ``parallel pid'' of the GAMMA process group of the caller, as assigned by the previous call to function gamma_init().





int gamma_my_node (void);
Returns the instance number of the caller process, relative to the GAMMA process group of the caller itself. If the group counts num_nodes processes, the returned value will be in the range from 0 to num_nodes-1. The process which created the process group has always instance number zero.

The programming paradigm supported by GAMMA is Single Program Multiple Data (SPMD). In this paradigm, each process may differentiate its behaviour by testing its own instance number.





int gamma_how_many_nodes (void);
Returns the number of process instances belonging to the GAMMA process group of the caller process.




int gamma_mlock (
	void *buffer,
	unsigned long len
);
This function pre-fetches and locks into physical RAM a contiguous region in the virtual memory of the calling process starting from address buffer and counting len bytes.

Usually such a contiguous memory region is either a store for incoming messages or a global variable accessed by a receiver or error handler. It must be pre-fetched and locked into physical RAM in order for the GAMMA driver not to incur into a page fault while storing an incoming message or running a handler.

gamma_mlock() adds the pre-fetch functionality to the standard UNIX mlock() function.

int gamma_munlock (
	void *buffer,
	unsigned long len
);
int gamma_munlockall (void);
These functions unlock previously locked memory regions. They are very similar to the standard UNIX munlock() and munlockall() calls.




void gamma_time(time_586 t);
The content of Pentium's register TSC is copied to variable t. Type time_586 is defined as struct { unsigned long hi; unsigned long lo; }

Register TSC is incremented by one at each CPU clock tick, so this function is useful for time measurements involved in performance evaluations.

double gamma_time_diff(time_586 b, time_586 a);
The time interval between instants b and a (possibly recorded by means of the gamma_time function) is computed in microseconds and returned as result.

Currently the conversion from CPU clock ticks to microseconds requires a constant named CLOCK to be set to the CPU clock frequency in MHz before compiling the GAMMA library. More information in the README file enclosed with the GAMMA source code.





unsigned char gamma_active_port;
During the execution of a receiver handler or an error handler, such variable holds the number of the port which has triggered the execution of the handler itself.
unsigned char gamma_errno;
During the execution of an error handler, such variable holds a number corresponding to the error condition which has triggered the execution of the handler. The values that gamma_errno may hold are:
GAMMA_ERR_FRAMELOST (2)
in case a chunk of a message was lost;
GAMMA_ERR_MSGTOOLONG (3)
in case the user-level memory region for storing an incoming message is either missing (that is, no buffer is registered with the receiving port) or no longer valid (that is, the buffer was filled by a previous message and not replaced with a fresh one).
GAMMA_ERR_SENDFAILURE (4)
in case the transmission of a message has failed, for instance due to network congestion.


Please send suggestions and comments to:

Giuseppe Ciaccio, ciaccio@disi.unige.it


Last Updated: 8 March 2004