AMD and NVIDIA Try to Reinvent Parallel Processing
The last few weeks have seen both AMD and NVIDIA announce that the GPU is the new CPU: AMD announced ‘stream processors’ before, and NVIDIA announced CUDA at, SuperComputing'06. Clearly the announcements were timed to compete with one another, but which market they are actually addressing is not entirely clear. The announcements we made in a high-performance computing (HPC) event, but they do not really address HCP issues directly.
Both AMD and NVIDIA are offering GPUs as co-processor add-ins, yet their approaches are different not surprising perhaps given their different backgrounds. AMD is approaching things from a general purpose processor perspective, where there is always an upward pressure on performance, having just acquired ATI with its graphics processor range. NVIDIA is approaching things from a graphics rendering perspective, where the games market has more and better performance as the primary, possibly sole, driver. In the end though, both are actually just offering multi-core, co-processor-based, add-in approaches. The processors being offered may be new, but there isn't actually anything new in the overall approach of having add-in co-processors. It has been tried before, but hasn’t really caught on big time. Can it now?
So why is now a new opportunity? The recent development and deployment of multi-core processors (MCPs1) has opened up a whole new perspective on the processor market. With MCPs performance comes from coupling small, specialised processors on a single chip each of which consume a fraction of the power of a comparable traditionally designed processor. Players such as Intel, IBM and others are moving into the MCP space, offering from 2 to 512 processors on a chip. In effect, AMD and NVIDIA are just joining this MCP space by bringing specially amended/focused GPUs to market as MCPs. So, really, it is a question of join the bandwagon now, or missing out.
‘Why at SuperComputing'06?’ is an easy question to answer of course: SuperComputing'06 attracts players who purchase large amounts of compute power, many of whom have very deep pockets (to absorb the initially higher cost of early systems). These would-be purchasers are also far more forgiving of bugs that arise in early versions of systems.
Is there really a market for the AMD and NVIDIA offerings? In a sense, undoubtedly. The world is different now to when add-in co-processors were last tried, and the MCP bandwagon is strong. The desire for higher performance is ongoing. The rate of increase of power consumption by leading-edge processors is now becoming a major issue for the average user. MCPs offer a way forward, adapting the lessons of the history of massively parallel processing. Do AMD and NVIDIA have the right approaches?
AMD’s approach, dubbed the AMD Stream Processor, is a graphics chip (GPU) intended for use processing non-graphical data. It comes as a PCI express add-in card with 1GB of on-card GDDR3 memory. This is consistent with either ATI’s Radeon X1900 or X1950 chips being the core processor, as has been surmised by various sources. The Stream Processor offers what AMD describe as a ‘Close to the Metal’ interface which offers a set of low-level APIs and direct access to commands for the GPU processors. These it claims will allow the chip’s graphics-oriented components to perform non-graphics processes up to eight times faster than graphics-focused code that tries to manipulate the processors’ bit-handling capabilities to perform general purpose tasks. It uses existing processors and the software library to provide access to the chip’s parallel capabilities. Ironically, GDDR3 first appeared on NVIDIA's products.
NVIDIA's solution, NVIDIA CUDA (‘Compute Unified Device Architecture’), offers a software system which interfaces to their new MCP GPUs with tens of specialised cores per chip. The architecture is described as ‘fundamentally new’ for the GPU; enabling it to solve complex problems across consumer, business and technical industries. CUDA will be implemented on NVIDIA's GeForce 8800 and Quadro processors. The system is programmed through an interface to a C compiler of which no further details are given, but which gives access to a range of standard compute-intensive routines. The processors that use this must be ‘CUDA-enabled’.
The question is whether or not these approaches actually offer a genuine way forward for the average user. That GPUs are extremely fast is demonstrably true. Can this technology translate into broader applications? GPUs are examples of parallel processing engines on a chip, with specific processors available for various functions; the whole carefully architected for the graphics drawing and rendering process. These stream architectures are of themselves not that new, they have been being discussed for well over twenty years. Dataflow architectures are well known to offer a very good model for certain classes of application, particularly in parallel systems. They work well for particular classes of application, but not for all. Where they work, they often work well and they also scale well. As the number of cores in MCPs grow we could expect that, again for the right classes of applications, performance would scale.
About twenty years ago Texas Instruments and others attempted to use Digital Signal Processor (DSP) chips for general purpose applications. The first reason that DSP chips didn’t get taken up was the lack of adequate tools. Secondly they only worked well for certain types of application mostly ones that were very similar to signal processing, unsurprisingly. In addition, the lack of integration between DSP processors and their hosts made them, in the opinion of those who used them, difficult to use and no alternative to dedicated high-performance architectures. This point is relevant here because, in a recent survey conducted by us, most respondents were less than enthusiastic about hybrid architectures, particularly using GPUs as add-in co-processors.
Also about fifteen years ago, vector co-processors (VCPs) were going to be ‘all the rage’. A lot of work was done creating chips and on the compilers required to perform the ‘vectorization‘. The single instruction multiple data (SIMD) style of parallelism works well for certain classes of problem, imaging, signal processing, networking, but it does require an appreciation of parallelism to exploit correctly. Although VCPs were relatively successful in niche markets, they did not become the norm. Problems with the silicon technology of the time and the lack of appreciation of the way of programming, were probably the main issues.
Today, things are different. Parallelism appears to be the only way forward for improved performance for commodity computers. The silicon technology is up to the task of producing a wide range of different types of parallel processor. It is very clear that the processors being produced are technically very competent and will doubtless make substantial inroads in the advanced graphics market and contribute healthily to their maker’s bottom lines. The real question seems to be: Is the programming and compilation technology there to make use of the hardware? Currently, we believe not. Our survey indicates that system developers are not really aware of all the knowledge that has been gained about varieties of parallelism and how to exploit them that has been gained over the last 30 years. The parallel hardware has been reinvented, it seems now that parallel software has to be reinvented.
So whether or not AMD and NVIDIA will really make an impact here remains to be seen. They may not be in control of their success.
1 NVIDIA refer to some of their products as ‘MCPs’, meaning media and communications processors, which is clearly not the same as the more generally accepted meaning as multi-core processors.
