PicoChip – A Suitable Case for Emulation?
One of the unsung heroes of the multicore revolution is UK-based PicoChip. Self-described as “a startup in 2001” the company has been supplying multiprocessor cores for a number of years and is currently shipping at the rate of about one hundred thousand processors per year. In its PC2xx series each processor has up to 296 cores, with hardware resources on-chip to assist with specific functions required for cryptography, Fast Fourier Transforms (FFTs), correlations, and the like. It is also possible to chain processors together to meet throughput requirements. Some variants also include an integrated ARM processor. It is, thus, very much an example of what we have hitherto called an HFC (high functionality computing) processor.
This is not a case of a technology seeking an application as is the case with many early-stage companies. As an example to many startups, PicoChip is highly-focused and has built its technology very much to target its primary market, which is long-range wireless networking. The 140-man company has become a world-leader in the production of processors targeting the picocell (short range) and femtocell (even shorter range) market. With sales wins spread fairly uniformly across the world from the USA to China (where it has established a substantial local design capability, as well as collaboration and representation) it has achieved a global presence in about eight years. It has also created alliances with an impressive list of partners from Samsung down.
The design that handles this is remarkably simple and well thought-out. The 16-bit fixed-point design is connected by a switched grid-structure. The network allows for interconnection of all processors within a very few clock cycles and is specifically designed to handle communication issues including bus contention. The architecture seems admirably suited to its market. While actually a DSP engine with broader potential, the company has identified and targeted its market, eschewing others.
Nor does it run at ultra-fast speeds. The Array Elements (AEs), the main cores, are clocked at a comparatively slow 160 MHz. The various classes of AE have memory tailored to their functional needs. This is not an engine which is meant for general purpose processing, in contrast to most embedded systems. As a result the power-consumption is relatively low: it is capable of 150 GOPS/W (giga-operations per second per Watt) at a power consumption of about 3 W. Compare this with, say, Tilera's Tile64 which delivers 30-20 GOPS/W at 0.7 GHz and 15-22 W. For comparison Intel's more general purpose 80-core Terascale floating point processor achieves 16 GFLOPS/W at 3 GHz and 62 W.
The reasons for this level of performance are many fold: the fact that the array is large while not actually clocked at high speeds; the relative simplicity of both the core designs and the operations that they are required to execute; and the fact that the technology is only 90 nm means that the system is not power hungry in the same way that many non-embedded processors are.
Applications for these systems are necessarily low-level and the system is programmed in C with specialized interfaces.
As a result of its architecture and relatively low small footprint, the processor is also able to include a degree of built-in redundancy. This is not fault-tolerance, but the ability to address yield issues by switching in an extra row of processors in the fabrication if errors are detected in test, thus giving an effective yield increase of about 7%. This is an issue that will occur in the implementation of MCPs as higher core count impacts on economic yield.
One of the other concerns that is frequently raised about MCP designs is robustness. Picochip believes that high count MCP designs can be both cheap and robust. As most of its systems are deployed in field, if not in positively hostile, environments, this view is shared by the market.
PicoChip isn't unique in this or many of the other issues that we have highlighted. What is really interesting about it, is its focus on using these technologies to address the needs of a particular market (in this case mobile communications) and the lessons that we can take from this to the wider arena as core counts grow. Of course, there is far more to the PicoChip's processors and their significant technical achievements than we can discuss in depth here.
With several hundred cores routinely placed on processor dies in the next few years for more than “just” mobile telecomms, a lot can be learnt by looking at the way that the PicoChip architecture and the processor are built. Simplicity of architecture is going to be essential to keep raising the core count as we reach the limits of Moore's Law implied by the basic physics of electronic devices. By raising core count we will increase the level of parallelism and with it throughput. At the same time core complexity will have to diminish in order to keep overall total device count down, although not perhaps as far as fixed point processors (there have been hardware/software designs that have proposed variants on fixed point as a way forward). PicoChip offers us a hint at a resolution of some of the issues that, while addressed regularly in the embedded processors, aren't being that readily embraced (with some notable exceptions) outside that world.
There may well be a lot about PicoChip that start-ups in the industry could do worse than build upon.
