Workshop PC-NOW 2000

Abstracts of Papers

Gigabit Performance under NT

by Mark Baker, University of Portsmouth, UK;
Stephen Scott, and Al Geist, Oak Ridge Nat. Lab., Tennessee, USA;
Logan Browne, Hiram College, Ohio, USA.

Abstract

The recent interest and growing popularity of commodity-based cluster computing has created a demand for low-latency, high-bandwidth interconnect technologies. Early cluster systems have used expensive but fast interconnects such as Myrinet or SCI. Even though these technologies provide low-latency, high-bandwidth communications, the cost of an interface card almost matches that of an individual computer in the cluster. Even though these specialist technologies are popular, there is a growing demand for Ethernet which can provide a low-risk and upgradeable path with which to link clusters together. In this paper we compare and contrast the low-level performance of a range of Giganet network cards under Windows NT using MPI and PVM. In the first part of the paper we discuss our motivation and rationale for undertaking this work. We then move on to discuss the systems that we are using and our methods for assessing these technologies. In the second half of the paper we present our results and discuss our findings. In the final section of the paper we summarize our experiences and then briefly mention further work we intend to undertake.

The MultiCluster model to the integrated use of Multiple Workstation Clusters

by Marcos Barreto, Rafael Avila, and Philippe Navaux,
Inst. of Informatics, UFRGS, Brazil.

Abstract

One of the new research tendencies within the well-established cluster computing area is the growing interest in the use of multiple workstation clusters as a single virtual parallel machine, in much the same way as individual workstations are nowadays connected to build a single parallel cluster. In this paper we present an analysis of several aspects concerning the integration of different workstation clusters, such as Myrinet and SCI, and propose a MultiCluster model as an alternative to achieve such integrated architecture.

Memory Management in a combined VIA/SCI Hardware

by Mario Trams, Wolfgang Rehm, Daniel Balkanski, and Stanislav Simeonov,
Fakultaet fuer Informatik, Technische Universitaet Chemnitz, Germany

Abstract

In this document we make a brief review of memory management and DMA considerations in case of common SCI hardware and the Virtual Interface Architecture. On this basis we expose our ideas for an improved memory management of a hardware combining the positive characteristics of both basic technologies in order to get one cmpletely new design rather than simply adding one to the other. The described memory management concept provides the opportunity of a real zero-copy transfer for Send-Receive operations by keeping full flexibility and efficiency of a nodes' local memory management system. From the resulting hardware we expect a very good system throughput for message passing applications even if they are using a wide range of message sizes.

Parallel Information Retrieval on an SCI-based PC-NOW

by Sang-Hwa Chung, Hyuk-Chul Kwon, Kwang Ryel Ryu, Han-Kook Jang, Jin-Hyuk Kim, and Cham-Ah Choi,
Div. of Computer Science and Eng., Pusan National University, Korea

Abstract

This paper presents an efficient parallel information retrieval(IR) system which provides fast information service for internet users on low-cost high-performance PC-NOW environment. The IR system is implemented on a PC cluster based on the Scalable Coherent Interface(SCI), a powerful interconnecting mechanism for both shared memory models and message passing models. In the IR system, the inverted-index file(IIF) is partitioned into pieces using a greedy declustering algorithm and distributed to the cluster nodes to be stored on each node's hard disk. For each incoming user's query with multiple terms, terms are sent to the corresponding nodes which contain the relevant pieces of IIF to be evaluated in parallel. The IR system is developed using a distributed-shared memory programming technique based on SCI. According to the experiments, the IR system outperforms an MPI-based IR system using Fast Ethernet as an interconnect. Speed up of up to 3.1 was obtained with 8-node cluster in processing each query on a 500,000-document IIF.

An Open Market-based Architecture for Distributed Computing

by Spyros Lalis, and Alexandros Karipidis,
CS Dept., University of Crete, Greece

Abstract

One of the challenges in large scale distributed computing is to utilize the thousands of idle personal computers. In this paper, we present a system that enables users to effortlessly and safely export their machines in a global market of processing capacity. Efficient resource allocation is performed based on statistical machine profiles and leases are used to promote dynamic task placement. The basic programming primitives of the system can be extended to develop class hierarchies which support different distributed computing paradigms. Due to the object-oriented structuring of code, developing a distributed computation can be as simple as implementing a few methods.

ATOLL, a new switched, high speed Interconnect in comparison to Myrinet and SCI

by Markus Fischer, Ulrich Bruening, Joerg Kluge, Lars Rzymianowicz, Patrick Schulz, and Mathias Waack,
Dept. Computer Eng., Univ. of Mannheim, Germany

Abstract

While standard processors achieve supercomputer perforrmance, a performance gap exists between the interconnect of MPP's and COTS. Standard solutions such as Fast Ethernet can not keep up with the demand for high speed communication of todays powerful CPU's. Hence, high speed interconnects have an important impact on a cluster's performance. While standard solutions for processing nodes exist, communication hardware is currently only available as a special, expensive non portable solution. ATOLL presents a switched, high speed interconnect, which fulfills the current needs for user level communication and concurrency in computation and communication. ATOLL is a single chip solution, additional switching hardware is not required.

MPI Collective Operations over IP Multicast

by Hsiang Ann Chen, Yvette O. Carrasco, and Amy W. Apon,
Dept. of Computer Science & Comp. Eng., Univ. of Arkansas, USA

Abstract

Many common implementations of Message Passing Interface (MPI) implement collective operations over point-to-point operations. This work examines IP multicast as a framework for collective operations. If a receiver is not ready when a message is sent via IP multicast, the message is lost. Two techniques for ensuring that a message is not lost due to a slow receiving process are examined. The techniques are implemented and compared experimentally over both a shared and a switched Fast Ethernet. The average performance of collective operations is improved as a function of the number of participating processes and message size for both networks.

ClusterNet: An Object-Oriented Cluster Network

by Raymond R. Hoare,
Dept. of Electr. Eng., University of Pittsburgh, USA,

Abstract

Parallel processing is based on utilizing a group of processors to efficiently solve large problems faster than is possible on a single processor. To accomplish this, the processors must communicate and coordinate with each other through some type of network. However, the only function that most networks support is message routing. Consequently, functions that involve data from a group of processors must be implemented on top of message routing. We propose treating the network switch as a function unit that can receive data from a group of processors, execute operations, and return the result(s) to the appropriate processors. This paper describes how each of the architectural resources that are typically found in a network switch can be better utilized as a centralized function unit. A proof-of-concept prototype called ClusterNet has been implemented to demonstrate feasibility of this concept.

A PC-NOW Based Parallel Extension for a Sequential DBMS

by Matthieu Exbrayat, and Lionel Brunie,
LISI, INSA Lyon, France

Abstract

In this paper we study the use of networks of PCs to handle the parallel execution of relational database queries. This approach is based on a parallel extension, called {\em parallel relational query evaluator}, working in a coupled mode with a sequential DBMS. We present a detailed architecture of the parallel query evaluator and introduce Enkidu, the efficient Java-based prototype that has been build according to our concepts. We expose a set of measurements, conducted over Enkidu, and highlighting its performances. We finally discuss the interest and viability of the concept of parallel extension in the context of relational databases and in the wider context of high performance computation.

Keywords: Networks of workstations, Parallel DBMS, Java

chiola@disi.unige.it, Apr. 14, 2000