SOFTWARE SOLUTIONS
Igor Tartalja and Veljko Milutinovic
PREFACE
The problem of cache coherence (consistency) has been studied for almost
two decades now, and most of the solutions introduced so far were on the
hardware level. A representative set of papers dealing with hardware issues
of the cache coherence problem, was selected and presented in a companion
tutorial: The Cache Coherence Problem in Shared Memory Multiprocessors:
Hardware Solutions, edited by M. Tomašević and V. Milutinovic.
Intensive work on the development of software mechanisms for coherence
maintenance begins in the second half of the 80's. Almost all software
solutions (except some trivial ones) are developed through academic research
and implemented only in prototype machines. Because of that, we have found
that the field of software techniques for maintaining the cache coherence
is widely open for future research and development. This tutorial book
is viewed as a selection of papers dealing with the "state of the
art" software solutions for cache coherence maintenance in shared
memory multiprocessors. This selection is almost complementary with the
selection given in the previously mentioned Tomašević-Milutinović tutorial
book, as well as with the selection of papers dealing with the closely
related topic of distributed shared memory (DSM), given in the Protić-Tomašević-Milutinović
IEEE tutorial book.
This tutorial is intended for an experienced reader in computer engineering,
but probably novice in the topic of the cache coherence. In our opinion,
the selection of papers in this tutorial will satisfy readers who wish
to gain an in-depth understanding of the problem, as well as those interested
in a comprehensive, rather wide overview. We think that this tutorial will
be of the special interest for both categories of multicomputer investigators
and designers: computer architects and compiler writers. We try to collect
all representative approaches to the software coherence maintenance, as
well as a number of efforts in the related performance evaluation field.
The tutorial may also be used as a software coherence reference handbook
for advanced undergraduate and typical graduate students in multiprocessing
and multiprogramming areas.
The selected papers (27) for this tutorial are organized into five chapters.
Each chapter is devoted to a separate topic, although we are aware of the
fact that some of the papers could have been classified into two, or maybe
even more, chapters. The short editors' introductions are given for the
entire book, and then for each chapter separately.
Editors
November 1993
Shared memory multiprocessors with private cache memories are widely used
as natural and efficient parallel architectures to support the relatively
simple programming model based on data sharing. Meanwhile, when two or
more processors with private caches share changeable data, the problem
of cache coherence (consistency) arises. After a processor changes the
value of a variable in its private cache memory, copies of the same variable
in other caches become stale, and should be invalidated or updated before
consumption. One frequently cited definition of cache coherence was given
in Censier and Feautrier paper [CENSI78].
In the beginning, the problem was solved at the hardware level. Most of
the hardware schemes for coherence maintenance belong to one of the two
large groups: "snoopy" schemes (surveyed and compared in [ARCHI86])
and "directory" schemes (surveyed and compared in [AGARW88]).
The field of hardware solutions to cache coherence problem is properly
covered with a number of surveys like those in the papers written by Stenstrom
[STENS90], Tomašević and Milutinović [TOMAS93], and Lilja [LILJA93] (the
paper by Lilja [LILJA93] is included in the first chapter of this tutorial).
A representative selection of papers dealing with hardware solutions to
the problem is given in the companion tutorial book "The Cache Coherence
Problem in Shared Memory Multiprocessors: Hardware Solutions," edited
by Tomašević and Milutinović.
During the second half of the previous decade, a considerable research
effort was made in the direction of software solutions to the cache coherence
problem. Although the transparency of the coherence problem, inherently
provided by hardware schemes, is sacrificed if the software approach is
applied, there is a number of reasons for software maintenance of coherence
to be chosen.
First, software methods are relatively less expensive to implement. Although
a number of software schemes needs some hardware support, the complexity
of the hardware support is usually considerably less than the complexity
of an appropriate hardware used to maintain coherence without any software
help. Second, contemporary research (presented in the last two chapters)
clearly indicates that the performance of a coherence scheme strongly depends
on the multiprocessor workload characteristics. Consequently, either the
hardware approach or the software approach can demonstrate better performance,
depending on the environment and the application. Software schemes are
potentially more efficient (than hardware schemes) for a class of applications
characterized with a relatively low level of sharing. Third, all software
schemes, according to many serious studies, are scalable. One general feature
of software schemes is self-invalidation. Preventing the incoherence conditions
(sometimes very conservatively), software schemes eliminate the need for
processors to communicate, because of the permanent maintenance of coherence.
This reduces the network traffic, and makes the software schemes well scalable.
In the most of software schemes, decisions about coherence actions are
partially made statically (during the program compilation), and partially
dynamically (during the program execution). Consequently, in this tutorial,
software schemes for coherence maintenance will be classified as static
or dynamic. The static schemes are primarily based on the compile-time
program analysis and insertion of specialized coherence maintenance instructions
into the code of the analyzed program. The dynamic schemes are, on the
other hand, primarily based on the detection of incoherence at the execution
time and immediate reinforcement of appropriate actions. Static schemes
are incorporated into parallelizing optimizing compilers, while dynamic
schemes are incorporated into operating system kernels (primitive operations
on the kernel level, for synchronization and/or mutual exclusion, are now
extended with the cache coherence maintenance related code).
Our intention for this tutorial book was to incorporate all representative
approaches to the software coherence maintenance from the open literature,
as well as a selection of representative techniques for performance evaluation.
It should be noticed that some papers, dealing with analytic or simulation
modeling of coherence schemes, clearly conclude that (in several application
domains) software schemes are comparable in performance with hardware schemes,
and become even better than hardware schemes, for the workloads characterized
with migratory data and a relatively low level of sharing. Although the
software solutions, in spite of their relatively low-cost implementation,
have not been implemented in commercial systems so far, we think that the
above conclusion will attract the future attention of multiprocessor architects
and designers, not only in academia, but in industry, as well.
This tutorial book, containing a set of 27 selected papers, is organized
as follows. The first chapter contains four introductory readings, to give
a brief overview of the cache coherence problem, and particularly of the
software solutions to the problem. Chapter two includes selected static
software cache coherence schemes, while chapter three contains dynamic
software cache coherence schemes. The last two chapters cover the area
of performance evaluation. Chapter four introduces techniques for modeling
and performance evaluation of cache memories and cache coherence maintenance
mechanisms, while chapter five, finally, presents workload behavior and
simulation techniques.
Each chapter is preceded by a concise editors' introduction. All of these
introductions are organized in the same manner: the first paragraph introduces
the topic, the second paragraph lists the selected papers for the chapter,
and the following paragraphs briefly present the included papers -- each
paper is described by a separate paragraph. Editors' introductions for
the chapters two and three represent subsets of the surveys given in the
originally developed survey paper for this tutorial book. This educational
methodology is proven efficient for this type of tutorial text. Editors'
introductions are recommended for the first and fast reading, while the
related surveys of the schemes are for the later and detailed reading.
Some papers, originally scheduled for inclusion into this tutorial text,
have been excluded, because of the space limitations. These papers have
been moved into the category named suggestions for further reading, organized
separately for each chapter.
References:
|
[AGARW88] |
Agarwal,A.,Simoni,R., Hennessy,J., Horowitz,M., "An evaluation of directory schemes for cache coherence," Proceedings of the 15th Annual International Symposium on Computer Architecture, May 1988, pp.280-289. |
|
[ARCHI86] |
Archibald,J., Baer,J.-L., "Cache coherence protocols: evaluation using a multiprocessor simulation model," ACM Transactions on Computer Systems, Vol.4, No.4, November 1986, pp.273-298. |
|
[CENSI78] |
Censier,L.M., Feautrier,P., "A new solution to coherence problem in multicache systems," IEEE Transactions on Computers, Vol.C-27, No.12, December 1978, pp.1112-1118. |
|
[LILJA93] |
Lilja,D., "Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons," ACM Computing Surveys, Vol.25, No.3, September 1993, pp.303-338. |
|
[STENS90] |
Stenstrom,P., "A survey of cache coherence schemes for multiprocessors," IEEE Computer, Vol.23, No.6, June 1990, pp.12-24. |
|
[TOMAS93] |
Tomaševic,M., Milutinovic,V., "A survey of hardware solutions for maintenance of cache coherence in shared memory multiprocessors," Proceedings of the 26th Annual Hawaii International Conference on System Sciences, January 1993, Vol.1, pp.863-872 |
TABLE OF CONTENTS
Chapter 1: Introductory readings
Synchronization, Coherence, and Event Ordering in Multiprocessors
M.Dubois, C.Scheurich, and F.A.Briggs (IEEE Computer, Vol.21, No.2,
February 1988, pp.9-21)
A Survey of Cache Coherence Schemes for Multiprocessors
P. Stenstrom (IEEE Computers, Vol. 23, No. 6 June 1990, pp. 12-24)
Cache Coherence in Large-Scale Shared-Memory Multiprocessors: Issues and Comparisons
D.Lilja (ACM Computing Surveys, Vol.25, No.3, September 1993, pp.303-338)
A Survey of Software Solutions for Maintenance of Cache Consistency in Shared Memory Multiprocessors
I.Tartalja and V.Milutinović (originally developed for this tutorial, 1993.)
Chapter 2: Static software cache coherence schemes
Compiler-Directed Cache Management in Multiprocessors
H.Cheong and A.V.Veidenbaum (IEEE Computer, Vol.23, No.6, June 1990, pp.39-47)
C.mmp - a Multi-Mini Processor
W.A.Wulf and C.G.Bell (Proceedings of the Fall Joint Computer Conference, Montvale, N.Jersey, December 1972, pp.765-777)
The NYU Ultracomputer - Designing an MIMD, Shared Memory Parallel Computer
A. Gottlieb, R. Grishman, C.P. Kruskal, K.P. McAuliffe, L. Rudolph, and M. Snir ( IEEE Transakcions on Computers, Vol. C-32, No. 2, September 1983, pp. 175-189)
RP3 Processor-Memory Element
W.C.Brantley, K.P.McAuliffe, and J.Weiss (Proceedings of the 1985 International
Conference on Parallel Processing, August 1985, pp.782-789)
A Compiler-Assisted Cache Coherence Solution for Multiprocessors
V.A.Veidenbaum (Proceedings of the 1986 International Conference on Parallel
Processing, 1986, pp.1029-1036)
Multiprocessor Cache Design Considerations
R.L.Lee, P.-C.Yew, and D.H.Lawrie (Proceedings of the 14th Annual International Symposium on Computer Architecture, June 1987, pp.253-262)
A Cache Coherence Scheme With Fast Selective Invalidation
H.Cheong and A.V.Veidenbaum (Proceedings of the 15th Annual International Symposium on Computer Architecture, May 1988, pp.299-307)
Automatic Management of Programmable Caches
R.Cytron, S.Karlovsky, and K.P.McAuliffe (Proceedings of the 1988 International Conference on Parallel Processing, 1988, pp.229-238)
A Version Control Approach to Cache Coherence
H.Cheong and A.V.Veidenbaum (Proceedings of the International Conference on Supercomputing 89, June 1989, pp.322-330)
Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps
S.L.Min and J.-L.Baer (IEEE Transactions on Parallel and Distributed Systems, Vol.3, No.1, January 1992, pp.25-44)
Chapter 3: Dynamic software cache coherence schemes
Software-Controlled Caches in the VMP Multiprocessor
D.R.Cheriton, G.A.Slavenburg, and P.D.Boyle (Proceedings of the 13th Annual International Symposium on Computer Architecture, June 1986, pp.366-374)
CPU Cache Consistency with Software Support and Using One Time Identifiers
A.J.Smith (Proceedings of the Pacific Computer Communication Symposium, Seoul, Korea, October 1985, pp.142-150)
An Approach to Dynamic Software Cache Consistency Maintenance Based on Conditional Invalidation
I.Tartalja and V.Milutinović (Proceedings of the 25th Annual Hawaii International Conference on System Sciences, January 1992, pp.457-466)
Adaptive Software Cache Management for Distributed Shared Memory Architectures
J.K.Bennett, J.B.Carter, and W.Zwaenepoel (Proceedings of the 17th Annual International Symposium on Computer Architecture, June 1990, pp.125-134)
Chapter 4: Analytic Modeling
Modeling Bus Contention and Memory Interference in a Multiprocessor System
M.A. Marsan, G. Balbo, G. Conte, and F. Gregoretti (IEEE Transaction on Computers, Vol. C-32, No. 1, January 1983, pp. 60-72)
Analysis of Multiprocessors with Private Cache Memories
J.H.Patel (IEEE Transactions on Computers, Vol.C-31, No.4, April 1982, pp.296-304)
Effectiveness of Private Caches in Multiprocessor Systems with Parallel-Pipelined Memories
F.A.Briggs and M.Dubois (IEEE Transactions on Computers, Vol.C-32, No.1, January 1983, pp.48-59)
Evaluating the Performance of Software Cache Coherence
S.Owicki and A.Agarwal (Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, April 1989, pp.230-242)
Comparison of Hardware and Software Cache Coherence Schemes
S.V.Adve, V.S.Adve, M.D.Hill, and M.K.Vernon (Proceedings of the 18th Annual International Symposium on Computer Architecture, May 1991, pp.298-308)
A Model of Workloads and Its Use in Miss-Rate Prediction for Fully Associative Caches
J.P.Singh, H.S.Stone, and D.F.Thiebaut (IEEE Transactions on Computers, Vol.41, No.7, July 1992, pp.811-825)
Chapter 5: Workload Behavior Characterization and Simulation Techniques
Multiprocessor Cache Simulation Using Hardware Collected Address Traces
A.W.Wilson (Proceedings of the 23rd Annual Hawaii International Conference on System Sciences, 1990., Vol.1, pp.252-260)
On the Validity of Trace-Driven Simulation for Multiprocessors
E.J.Koldinger, S.J.Eggers, and H.M.Levy (Proceedings of the 18th Annual International Symposium on Computer Architecture, May 1991, pp.244-253)
Benchmark Characterization for Experimental System Evaluation
T.M.Conte and W.W.Hwu, (Proceedings of the 23rd Annual Hawaii International Conference on System Sciences, 1990, Vol.1, pp.6-18)
Characterization the Caching and Synchronization Performance of a Multiprocessor Operating System
J. Torrelas, A. gupta, and J. Hennessy (Proceeding of the 5th International Conference on Architecktural Support for Programming Languages and Operating Systems, October 1992, pp. 162-174)
A Performance Comparison of Directory-Based and Timestamp-Based Cache Coherence Schemes
S.L.Min and J.-L.Baer (Proceedings of the 1990 International Conference on Parallel Processing, August 1990, Vol.I, pp.305-311)