Paradise Postponed?

The late 1980s seemed to herald the dawn of a golden age of parallelism. Let’s call it Paradise!

There were companies with names like Cambridge Parallel Processing, Parsys, Maspar and Parsytec. Every other project had “par” in its acronym. Most computer centres had “parallel” in their title. There were languages such as Par C. The Transputer was in the vanguard of a horde of processors supporting parallelism. There was a wide choice of closely coupled, fine-grained, massively parallel SIMD (Single Instruction Multiple Data and MIMD (Multiple Instruction Multiple Data) machines available (for example the Intel Paragon), an extensive range of programming techniques and tools, and a significant pool of parallel programming expertise. Systems might come with eight, sixteen, 256 or perhaps thousands of processors. The name “Massively Parallel” was well-deserved. With the exception of a few large-scale supercomputers these sorts of configurations no longer exist. Where did it all go wrong? How was Paradise Lost?

The answer lies in the rise of the “killer micros” which just became faster and faster in response to a seemingly endless demand from a voracious user base. Moore’s Law ensured that most applications didn’t need to run on more that one processor to go fast enough. For those few applications that needed more power than a single processor could offer, emerging cluster technologies would usually solve the problem. These used parallelism, but not the closely coupled parallelism of the 1980s. For those applications which wouldn’t run on clusters, there were still the rare and expensive vector and parallel machines, but these certainly weren’t mainstream and were largely bought by organisations with government backing and significant programming resources.

Where do we go, now that Moore’s Law is taking us into uncharted waters? The days of a fixed architecture which gave twice the performance every two years are over. Processor clock speeds are not getting any faster now because the laws of physics impose limits. The response by manufacturers has been to put multiple cores on the same piece of silicon. Dual cores are now commonplace. Quad cores have just arrived. Embedded devices with several tens of cores have been deployed for some time. Thousand core devices are predicted before the end of the next decade.

How will these so called Terascale devices be programmed? Based on the experiences of the days of closely coupled, massive parallelism, new techniques and languages will be needed. Automatic compilation won’t cut the mustard. Threads and shared-memory models won’t scale. Programmers will have to learn new skills that are currently in short supply. Maybe this will flush the champions of parallel processing back in the 1980s out of the woodwork. Who knows? Will it be Paradise Regained? Or will it be re-invented? If so, using which models? And how many of history’s lessons will be relearnt?