The recent interest and growing popularity of commodity-based cluster computing has created a demand for low-latency, high-bandwidth interconnect technologies. Early cluster systems have used expensive but fast interconnects such as Myrinet or SCI. Even though these technologies provide low-latency, high-bandwidth communications, the cost of an interface card almost matches that of an individual computer in the cluster. Even though these specialist technologies are popular, there is a growing demand for Ethernet which can provide a low-risk and upgradeable path with which to link clusters together. In this paper we compare and contrast the low-level performance of a range of Giganet network cards under Windows NT using MPI and PVM. In the first part of the paper we discuss our motivation and rationale for undertaking this work. We then move on to discuss the systems that we are using and our methods for assessing these technologies. In the second half of the paper we present our results and discuss our findings. In the final section of the paper we summarize our experiences and then briefly mention further work we intend to undertake.
One of the new research tendencies within the well-established cluster computing area is the growing interest in the use of multiple workstation clusters as a single virtual parallel machine, in much the same way as individual workstations are nowadays connected to build a single parallel cluster. In this paper we present an analysis of several aspects concerning the integration of different workstation clusters, such as Myrinet and SCI, and propose a MultiCluster model as an alternative to achieve such integrated architecture.
In this document we make a brief review of memory management and DMA considerations in case of common SCI hardware and the Virtual Interface Architecture. On this basis we expose our ideas for an improved memory management of a hardware combining the positive characteristics of both basic technologies in order to get one cmpletely new design rather than simply adding one to the other. The described memory management concept provides the opportunity of a real zero-copy transfer for Send-Receive operations by keeping full flexibility and efficiency of a nodes' local memory management system. From the resulting hardware we expect a very good system throughput for message passing applications even if they are using a wide range of message sizes.
This paper presents an efficient parallel information retrieval(IR) system which provides fast information service for internet users on low-cost high-performance PC-NOW environment. The IR system is implemented on a PC cluster based on the Scalable Coherent Interface(SCI), a powerful interconnecting mechanism for both shared memory models and message passing models. In the IR system, the inverted-index file(IIF) is partitioned into pieces using a greedy declustering algorithm and distributed to the cluster nodes to be stored on each node's hard disk. For each incoming user's query with multiple terms, terms are sent to the corresponding nodes which contain the relevant pieces of IIF to be evaluated in parallel. The IR system is developed using a distributed-shared memory programming technique based on SCI. According to the experiments, the IR system outperforms an MPI-based IR system using Fast Ethernet as an interconnect. Speed up of up to 3.1 was obtained with 8-node cluster in processing each query on a 500,000-document IIF.
One of the challenges in large scale distributed computing is to utilize the thousands of idle personal computers. In this paper, we present a system that enables users to effortlessly and safely export their machines in a global market of processing capacity. Efficient resource allocation is performed based on statistical machine profiles and leases are used to promote dynamic task placement. The basic programming primitives of the system can be extended to develop class hierarchies which support different distributed computing paradigms. Due to the object-oriented structuring of code, developing a distributed computation can be as simple as implementing a few methods.
While standard processors achieve supercomputer perforrmance, a performance gap exists between the interconnect of MPP's and COTS. Standard solutions such as Fast Ethernet can not keep up with the demand for high speed communication of todays powerful CPU's. Hence, high speed interconnects have an important impact on a cluster's performance. While standard solutions for processing nodes exist, communication hardware is currently only available as a special, expensive non portable solution. ATOLL presents a switched, high speed interconnect, which fulfills the current needs for user level communication and concurrency in computation and communication. ATOLL is a single chip solution, additional switching hardware is not required.
Many common implementations of Message Passing Interface (MPI) implement collective operations over point-to-point operations. This work examines IP multicast as a framework for collective operations. If a receiver is not ready when a message is sent via IP multicast, the message is lost. Two techniques for ensuring that a message is not lost due to a slow receiving process are examined. The techniques are implemented and compared experimentally over both a shared and a switched Fast Ethernet. The average performance of collective operations is improved as a function of the number of participating processes and message size for both networks.
Parallel processing is based on utilizing a group of processors to efficiently solve large problems faster than is possible on a single processor. To accomplish this, the processors must communicate and coordinate with each other through some type of network. However, the only function that most networks support is message routing. Consequently, functions that involve data from a group of processors must be implemented on top of message routing. We propose treating the network switch as a function unit that can receive data from a group of processors, execute operations, and return the result(s) to the appropriate processors. This paper describes how each of the architectural resources that are typically found in a network switch can be better utilized as a centralized function unit. A proof-of-concept prototype called ClusterNet has been implemented to demonstrate feasibility of this concept.
In this paper we study the use of networks of PCs to handle the parallel execution of relational database queries. This approach is based on a parallel extension, called {\em parallel relational query evaluator}, working in a coupled mode with a sequential DBMS. We present a detailed architecture of the parallel query evaluator and introduce Enkidu, the efficient Java-based prototype that has been build according to our concepts. We expose a set of measurements, conducted over Enkidu, and highlighting its performances. We finally discuss the interest and viability of the concept of parallel extension in the context of relational databases and in the wider context of high performance computation.
Keywords: Networks of workstations, Parallel DBMS, Java