Historical Background
The traditional perception of a computer is of a single processor, the central processing unit (CPU), processing information. Applications were written that had a single path of execution through them since there was a single CPU. In order to run multiple applications at the same time, operating systems had to use multi-tasking techniques to offer time on the CPU amongst all the concurrently executing applications. As applications got more complicated, and people wanted to run more and more applications at the same time, processor and memory speeds had to get faster, and memory capacity had to get bigger. But we had Moore's Law on our side (Moore's Law was an empirical observation that the performance, and density of components, in a single chip doubled every 18 months). Chip manufacturers made bigger and faster chips despite all the concerns that we were getting close to physical limits.
So what is the problem? Heat dissipation. The faster chips run, the hotter they get. You can run machines very hot if you have liquid cooling systems, and this is fine for servers in big data centres. However, the average desktop computer, and certainly all laptops, have to rely on air cooling. So the CPU speed limit is not so much the physical limit of how big and fast chips can be made to work, but is how much heat can successfully be removed from around the chips. The chip industry seems to have decided that 3.5GHz x86 processors are at the heat dissipation limit.
Despite having reached this limit, the needs of game playing (which has been the major driver of desktop computing hardware over the last 15 years) are for more processing resource. Graphics cards have become complex computing sub-systems in their own right, and yet games require more processing and memory resource than is currently available. The solution to this problem is to put more processors into the computing system. Instead of sharing one CPU amongst all the applications the user wishes to run, computer system manufacturers are putting multiple, multi-core processors in their systems. Computers are now all multi-processor systems that can undertake multiple streams of computation in parallel.
Parallel processing has been part of the computing arena almost since the beginning of computers. Researchers have been studying parallelism for many years, and there is a lot of experimental data, and many programming techniques known for how to harness parallelism successfully. However, parallelism has generally been seen as the domain of research because chip manufacturers have, to now, always been able to make CPUs faster.
The one area of use for parallelism has been in high performance computing (HPC). In areas such as weather forecasting, computational fluid dynamics, computational physics and chemistry, there have always been vastly more demand for processing resource than single CPU systems have been able to provide. This led to the development of ‘supercomputers’ which are invariable multi-processor systems. Various different architectures have been tried, along with many different ways of programming them. A wealth of knowledge has been accumulated about how to work with parallel computing systems. A major problem has always been that such supercomputers are hugely expensive. It is not unusual for a supercomputer to cost £20,000,000. The huge cost of a single supercomputer is partly because the machines take considerable effort to design and build, but mostly because the market is so small: popular supercomputer lines might sell 100 units.
Due of the huge cost of supercomputers, many people use multiple, off-the-shelf workstations connected together in clusters, using high bandwidth interconnects, to replicate the computing capabilities of supercomputers. The success of these ‘cluster parallel’ approaches is part of the reason that workstation manufacturers now put multiple processors into every computer. Moreover, the chip manufacturers are putting multiple processors onto a single chip (multi-core processors) so that a single chip provide multiple processors. The average high-end workstation can now have what seems to be 8, 16, ... processors.
So in the drive to build bigger, faster, cheaper systems using low-cost, commodity components that can be air cooled, all computers are now parallel processors. Operating systems have evolved so that all this processing resource can be managed, but applications are generally not being written to harness the parallelism available: applications still tend to be written as single paths of execution, relying on the operating system to manage the processor resources. In effect, the multiple processors ont eh workstation are a resource for the operating system to work with rather than for applications to work with.
Of course, programmers have been working with concurrency in their single CPU systems for a long time. The problems of doing input–output have always tended to require applications to appear to be able to do more than one thing at once. To support this, the idea of threads came about, enabling programmers to write multi-threaded applications. Considerable effort has gone into creating good multi-threading support in programming languages. There are various libraries for C and C++ programming, for example, and newer programming languages, such as Java and Python, have built-in features for creating multi-threaded systems. Unfortunately, multi-threading is one of the biggest problem areas for programmers – the average programmer is generally not good at designing and programming complex multi-threaded systems. The issue is not that language tools are sometimes difficult to work with, it is that multi-threading is intrinsically hard for people to think about. Sometimes though the problem is a lack of education and training regarding the way in which threaded and parallel systems have to be programmed. The knowledge is there, but sometimes it is not part of the programmers armoury.
The need for application software to harness the processing capabilities offered by the multiple multi-core processor systems that are now becoming commonplace (the trend is towards commodity devices with tens or even hundreds of cores) means there is a need for knowledge and skills transfer from the pool of knowledge that has been gained over the last 30 years on how to successfully make use of parallelism. To realise this potential there has to be a combination of hardware design, skillful programming and advanced programming tools. The successful deployment of multi-core, multi-processor systems will involve the application of existing parallel processing techniques and the development of new methodologies tailored to these new devices.
