Tilera – a great little deal?

Late August is generally the doldrums of the annual news cycle. Unusual then to find a newcomer to the market making new announcements that may turn out to be of great importance to the micro-processor industry. Whatever the reasoning, Tilera chose the tail end of this month to announce its TILE64 family of sixty-four element multi-core processors.

If it was a market splash that they wanted to make then it has worked well for them – the current Google count is 1,040,000 entries. For the TILE64 alone it is well over half a million.

Derived from work done at MIT by Founder and CTO Anant Agarwal and others at MIT, the TILE64 is very firmly targeted at the embedded systems market. It makes some interesting strides forward in terms of the management and manipulation of data on a complex chip. The architecture owes debts to systolic arrays, switched and MIMD architectures going back at least to the 1970s. The executive team, too, has a long pedigree in processor technology which shows in many of the design decisions, as we shall see.

The basic element of the TILE64 is the tile, of which there are 64 interconnected copies, arranged in a 2-D grid. A tile consists of a three-way VLIW (Very Long Instruction Word) processor with associated two-level hierarchical cache and a switch (i.e. processor + memory + switch = tile). The five-way switch is the centrepiece of the chip&rlquo;s structure. Each tile is connected to five networks. Two are under hardware management for data movement to and from tiles and to memory. The other three networks are for applications use to enable cores to communicate among themselves and with I/O devices.

Data movement within multi-core chips increases at least as fast as the number of cores and in many applications grows more rapidly. Management of the flow(s) of data around the chip becomes critical in realising – or not – the potential performance of the multiple cores. Inter-device communication is sufficiently important that several companies are focussing their attention on the delivery of optical interconnects on-chip for the high-density processors including Luxtera, Primarion and Intel.

On the TILE64, each processor can see the all the other processors&rlquo; caches, so that opportunities for data movement abound. Multi-way switches represent one of several strategies for marshalling data in such complex systems and the five networks (as they are referred to by Tilera) of the TILE64 are configured so as to separate requirements for data movement. Tilera don&rlquo;t present independent data as to the efficacy of their strategy in practice, although the company&rlquo;s publicity makes reference to “a number of patented innovations that enhance the performance and flexibility of the [network]”.

The chip also integrates a number of interfaces including XAUI, gigabit Ethernet, UARTs, Flexible I/O and four DDR2 Controllers.

The complexity of the management of data influences the choice of processor design philosophy, which is the same choice as Intel made with their experimental 80-core processor – that has still to see commercial light of day. VLIW processors execute operations in parallel based on a schedule determined by the compiler. Since the order of execution (including which operations can execute simultaneously) is handled by the compiler, the processor can save on scheduling hardware. VLIW-based CPUs offer increased computational power over superscalar CPUs at the cost of greater compiler complexity and can be built using larger feature sizes (90nm for the TILE64), lower power consumption (170–300 mW/core) and lower clock speeds (600MHz––Ghz in the present case).

Only a relatively few years ago VLIW was deemed inefficient, but increases in device density has meant that many of its “perceived” inefficiencies have been reduced and VLIW has consequently grown in popularity. The use of VLIW simplifies the physical architecture at the expense of a more complex, and sophisticated compiler. This has the advantage of greater ease in fabrication and the ability to save device count.

Interestingly, each tile can support a kernel, or run under control of a supervisor system and as a result (using their “Hardwall” technology) can provide kernel-level protection. This technology enables the array to be partitioned to support different operating systems and/or different/or multiple instances of the same applications.

VLIW has obtained a very substantial following in the embedded market with a number of companies offering embedded VLIW products. These include Fujitsu, STMicroelectronics and NXP (formerly Philips). None however are anywhere near as sophisticated as Tilera&rlquo;s offerings.

Tilera appears to be offering an appropriate and well thought out set of software tools to go with the new architecture, advocating a “Gentle Slope Programming” approach. Using Tilera&rlquo;s ISO-standard C compiler and Linux to start development, the user can exploit data parallelism by using a set of libraries to enable multiple-tile use. The final stage is the use of debugging tools to optimise, debug and profile code. There is also an Eclipse-based IDE. The compiler itself derives from SGI&rlquo;s MIPSpro, designed to support parallel programming. Since we have not yet had access to the tools, nor benchmarked them against specific metrics, we are unable to comment further on the tools.

So, Tilera has a sophisticated product with a seemingly appropriate set of tools. Is it just another start-up? As far as can be told from the publicly available information Tilera is targeting its products at a very appropriate and important embedded markets: next generation networking products, spam detection, deep-level intrusion detection and similar applications. Tilera claim that a dozen customers including 3Com, GobackTV, and Codian are taking product. The first commercial products using the TILE64 are expected to appear next year.

Where does this place Tilera? This new family of processors ticks most of the boxes for an efficient embedded true multi-core processor. The TILE64 is an interesting and possibly important piece of hardware. Clearly the company are generating a lot of interest, including important commercial interest, and they have both product and adoption a substantial time before Intel is likely to even demonstrate its eighty-processor beast. Whether this will translate into longer-term market advantage is at this stage difficult to say. Could Tilera be the next big thing? We shall have to watch the market over the next few years.