THE CACHE COHERENCE PROBLEM IN SHARED-MEMORY MULTIPROCESSORS

SOFTWARE SOLUTIONS


Igor Tartalja and Veljko Milutinovic


PREFACE


The problem of cache coherence (consistency) has been studied for almost two decades now, and most of the solutions introduced so far were on the hardware level. A representative set of papers dealing with hardware issues of the cache coherence problem, was selected and presented in a companion tutorial: The Cache Coherence Problem in Shared Memory Multiprocessors: Hardware Solutions, edited by M. Tomašević and V. Milutinovic.


Intensive work on the development of software mechanisms for coherence maintenance begins in the second half of the 80's. Almost all software solutions (except some trivial ones) are developed through academic research and implemented only in prototype machines. Because of that, we have found that the field of software techniques for maintaining the cache coherence is widely open for future research and development. This tutorial book is viewed as a selection of papers dealing with the "state of the art" software solutions for cache coherence maintenance in shared memory multiprocessors. This selection is almost complementary with the selection given in the previously mentioned Tomašević-Milutinović tutorial book, as well as with the selection of papers dealing with the closely related topic of distributed shared memory (DSM), given in the Protić-Tomašević-Milutinović IEEE tutorial book.


This tutorial is intended for an experienced reader in computer engineering, but probably novice in the topic of the cache coherence. In our opinion, the selection of papers in this tutorial will satisfy readers who wish to gain an in-depth understanding of the problem, as well as those interested in a comprehensive, rather wide overview. We think that this tutorial will be of the special interest for both categories of multicomputer investigators and designers: computer architects and compiler writers. We try to collect all representative approaches to the software coherence maintenance, as well as a number of efforts in the related performance evaluation field. The tutorial may also be used as a software coherence reference handbook for advanced undergraduate and typical graduate students in multiprocessing and multiprogramming areas.


The selected papers (27) for this tutorial are organized into five chapters. Each chapter is devoted to a separate topic, although we are aware of the fact that some of the papers could have been classified into two, or maybe even more, chapters. The short editors' introductions are given for the entire book, and then for each chapter separately.

Editors
November 1993



EDITORS' INTRODUCTION


Shared memory multiprocessors with private cache memories are widely used as natural and efficient parallel architectures to support the relatively simple programming model based on data sharing. Meanwhile, when two or more processors with private caches share changeable data, the problem of cache coherence (consistency) arises. After a processor changes the value of a variable in its private cache memory, copies of the same variable in other caches become stale, and should be invalidated or updated before consumption. One frequently cited definition of cache coherence was given in Censier and Feautrier paper [CENSI78].


In the beginning, the problem was solved at the hardware level. Most of the hardware schemes for coherence maintenance belong to one of the two large groups: "snoopy" schemes (surveyed and compared in [ARCHI86]) and "directory" schemes (surveyed and compared in [AGARW88]). The field of hardware solutions to cache coherence problem is properly covered with a number of surveys like those in the papers written by Stenstrom [STENS90], Tomašević and Milutinović [TOMAS93], and Lilja [LILJA93] (the paper by Lilja [LILJA93] is included in the first chapter of this tutorial). A representative selection of papers dealing with hardware solutions to the problem is given in the companion tutorial book "The Cache Coherence Problem in Shared Memory Multiprocessors: Hardware Solutions," edited by Tomašević and Milutinović.


During the second half of the previous decade, a considerable research effort was made in the direction of software solutions to the cache coherence problem. Although the transparency of the coherence problem, inherently provided by hardware schemes, is sacrificed if the software approach is applied, there is a number of reasons for software maintenance of coherence to be chosen.


First, software methods are relatively less expensive to implement. Although a number of software schemes needs some hardware support, the complexity of the hardware support is usually considerably less than the complexity of an appropriate hardware used to maintain coherence without any software help. Second, contemporary research (presented in the last two chapters) clearly indicates that the performance of a coherence scheme strongly depends on the multiprocessor workload characteristics. Consequently, either the hardware approach or the software approach can demonstrate better performance, depending on the environment and the application. Software schemes are potentially more efficient (than hardware schemes) for a class of applications characterized with a relatively low level of sharing. Third, all software schemes, according to many serious studies, are scalable. One general feature of software schemes is self-invalidation. Preventing the incoherence conditions (sometimes very conservatively), software schemes eliminate the need for processors to communicate, because of the permanent maintenance of coherence. This reduces the network traffic, and makes the software schemes well scalable.


In the most of software schemes, decisions about coherence actions are partially made statically (during the program compilation), and partially dynamically (during the program execution). Consequently, in this tutorial, software schemes for coherence maintenance will be classified as static or dynamic. The static schemes are primarily based on the compile-time program analysis and insertion of specialized coherence maintenance instructions into the code of the analyzed program. The dynamic schemes are, on the other hand, primarily based on the detection of incoherence at the execution time and immediate reinforcement of appropriate actions. Static schemes are incorporated into parallelizing optimizing compilers, while dynamic schemes are incorporated into operating system kernels (primitive operations on the kernel level, for synchronization and/or mutual exclusion, are now extended with the cache coherence maintenance related code).


Our intention for this tutorial book was to incorporate all representative approaches to the software coherence maintenance from the open literature, as well as a selection of representative techniques for performance evaluation. It should be noticed that some papers, dealing with analytic or simulation modeling of coherence schemes, clearly conclude that (in several application domains) software schemes are comparable in performance with hardware schemes, and become even better than hardware schemes, for the workloads characterized with migratory data and a relatively low level of sharing. Although the software solutions, in spite of their relatively low-cost implementation, have not been implemented in commercial systems so far, we think that the above conclusion will attract the future attention of multiprocessor architects and designers, not only in academia, but in industry, as well.


This tutorial book, containing a set of 27 selected papers, is organized as follows. The first chapter contains four introductory readings, to give a brief overview of the cache coherence problem, and particularly of the software solutions to the problem. Chapter two includes selected static software cache coherence schemes, while chapter three contains dynamic software cache coherence schemes. The last two chapters cover the area of performance evaluation. Chapter four introduces techniques for modeling and performance evaluation of cache memories and cache coherence maintenance mechanisms, while chapter five, finally, presents workload behavior and simulation techniques.


Each chapter is preceded by a concise editors' introduction. All of these introductions are organized in the same manner: the first paragraph introduces the topic, the second paragraph lists the selected papers for the chapter, and the following paragraphs briefly present the included papers -- each paper is described by a separate paragraph. Editors' introductions for the chapters two and three represent subsets of the surveys given in the originally developed survey paper for this tutorial book. This educational methodology is proven efficient for this type of tutorial text. Editors' introductions are recommended for the first and fast reading, while the related surveys of the schemes are for the later and detailed reading.


Some papers, originally scheduled for inclusion into this tutorial text, have been excluded, because of the space limitations. These papers have been moved into the category named suggestions for further reading, organized separately for each chapter.


References:

[AGARW88]

Agarwal,A.,Simoni,R., Hennessy,J., Horowitz,M., "An evaluation of directory schemes for cache coherence," Proceedings of the 15th Annual International Symposium on Computer Architecture, May 1988, pp.280-289.

[ARCHI86]

Archibald,J., Baer,J.-L., "Cache coherence protocols: evaluation using a multiprocessor simulation model," ACM Transactions on Computer Systems, Vol.4, No.4, November 1986, pp.273-298.

[CENSI78]

Censier,L.M., Feautrier,P., "A new solution to coherence problem in multicache systems," IEEE Transactions on Computers, Vol.C-27, No.12, December 1978, pp.1112-1118.

[LILJA93]

Lilja,D., "Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons," ACM Computing Surveys, Vol.25, No.3, September 1993, pp.303-338.

[STENS90]

Stenstrom,P., "A survey of cache coherence schemes for multiprocessors," IEEE Computer, Vol.23, No.6, June 1990, pp.12-24.

[TOMAS93]

Tomaševic,M., Milutinovic,V., "A survey of hardware solutions for maintenance of cache coherence in shared memory multiprocessors," Proceedings of the 26th Annual Hawaii International Conference on System Sciences, January 1993, Vol.1, pp.863-872


TABLE OF CONTENTS


Chapter 1: Introductory readings

Synchronization, Coherence, and Event Ordering in Multiprocessors

A Survey of Cache Coherence Schemes for Multiprocessors

Cache Coherence in Large-Scale Shared-Memory Multiprocessors: Issues and Comparisons

A Survey of Software Solutions for Maintenance of Cache Consistency in Shared Memory Multiprocessors

Chapter 2: Static software cache coherence schemes

Compiler-Directed Cache Management in Multiprocessors

C.mmp - a Multi-Mini Processor

The NYU Ultracomputer - Designing an MIMD, Shared Memory Parallel Computer

RP3 Processor-Memory Element

A Compiler-Assisted Cache Coherence Solution for Multiprocessors

Multiprocessor Cache Design Considerations

A Cache Coherence Scheme With Fast Selective Invalidation

Automatic Management of Programmable Caches

A Version Control Approach to Cache Coherence

Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps

Chapter 3: Dynamic software cache coherence schemes

Software-Controlled Caches in the VMP Multiprocessor

CPU Cache Consistency with Software Support and Using One Time Identifiers

An Approach to Dynamic Software Cache Consistency Maintenance Based on Conditional Invalidation

Adaptive Software Cache Management for Distributed Shared Memory Architectures

Chapter 4: Analytic Modeling

Modeling Bus Contention and Memory Interference in a Multiprocessor System

Analysis of Multiprocessors with Private Cache Memories

Effectiveness of Private Caches in Multiprocessor Systems with Parallel-Pipelined Memories

Evaluating the Performance of Software Cache Coherence

Comparison of Hardware and Software Cache Coherence Schemes

A Model of Workloads and Its Use in Miss-Rate Prediction for Fully Associative Caches

Chapter 5: Workload Behavior Characterization and Simulation Techniques

Multiprocessor Cache Simulation Using Hardware Collected Address Traces

On the Validity of Trace-Driven Simulation for Multiprocessors

Benchmark Characterization for Experimental System Evaluation

Characterization the Caching and Synchronization Performance of a Multiprocessor Operating System

A Performance Comparison of Directory-Based and Timestamp-Based Cache Coherence Schemes


Previous slide


Next slide


Back to first slide