The Concertant Blog

  RSS Feed.

Sat, 26 Feb 2011

Tools for the future?

by Peter Dzwig

At present there are some very good tools out there for supporting parallel program development; from language extensions and compilers to software architecture and design tools. Yet there are none out there which actually deliver the holy grail of parallel programming; to take an arbitrary piece of sequential code and transmute it, in the modern equivalent of the Alchemists' Dream, into code capable of running on any platform and delivering anywhere near ideal performance.

In fact this particular dream is highly unlikely to happen because, in general, parallel code, and in particular parallel algorithms and hardware, are substantially different in form from their sequential counterparts.

Parallel programming, in many diverse forms, has been around as a commercial reality since at least the 1970s when ICL (now part of Fujitsu) launched the DAP (Distributed Array Processor) as an attached processor for its mainframes; you can push that date back further if you include academic exercises and multiple CPU systems. Yet to date the dream goal hasn't been reached. Technologies as diverse as the DAP, SuperNode and its relatives from Meiko and Parsytec, from supercomputers to modern multicores, sought to solve the problem, or at least address it, through the deployment of specialised compilers, extensions to existing languages, or complete new languages. This worked adequately at the time because the user base for each system was limited in one way or another, and many of the would-be users were in research facilities. This meant that they had the time to work out the problems, and modify their code appropriately.

The adoption of multicores as the way to deliver cost-effective performance (by a wide variety of metrics) by the preponderance of manufacturers has meant that market penetration has increased for processors having two, four or eight cores. The demographic of the user base has broadened dramatically as a result. This means that no longer are users prepared to deal with arcana in order to get promised performance, they want it delivered simply. They don't want to see any change in the way that they program and retraining should be minimal if any is needed at all. Up to now the user (except for the specialist) has been hidden from the details of a processor by the operating system and other layers.

As the number of cores on a chip grows – and it will do – the problem of how to realise the performance on offer will become increasingly more complex. The industry cannot expect the user or programmer to learn specific languages or extensions to languages in order to be able to program company X's laptops. It will get more complex still because it will be possible to create highly customised specialist installations. While potentially important where there are particular requirements for high performance, these will reduce the potential for code portability.

Then there are all those different architectures...

What the user will want is to program/develop their program/application once and once only, thereby preserving software investment. Whereas nowadays a modern applications can still run on a 1.2 GHz Pentium (albeit slowly), the question of such backward code deployment will become more complex and eventually downright impossible.

How are we to address this? The simple answer is that we don't know at present. Yes, we could point to a few technologies around at present; but perhaps it is better to ask what the user is likely to want. If portability (i.e. maintaining the value of software investments) is to be the principle criterion, then surely we need to hide hardware changes from the user. If we assume the existence of some sort of operating systems level then we are presumably interposing an additional layer between user and the operating system. One would anticipate that would detract from raw performance, which for those who demand raw performance, would be detrimental.

However while this might be important for certain user communities we must accept that the vast majority of users, and indeed of developers too, don't care – provided that they don't loose “a lot” of performance. This is particularly important as performance improves. What we should be looking at is the proportion that is lost. Provided that this can be limited to a low proportion of the overall figure the vast majority probably don't care. Indeed there is some suggestion that such overheads may reduce over time if history is a guide.

What tools we might run over a large core count system, we don't yet know. It may well be that the tools that we will need don't yet exist. It would be an extremely worthwhile program of research for people to step back and take a long hard look at what we really need. Present assumptions, from almost all sectors of our industry is that they will be like what we have already. What justifies that assumption? If you look at that question in some depth – and that is a part of Concertant's activities – then the evidence that we know how to deal with even 64-core systems (due around 2015) is fairly scarce. There is certainly a paucity of consensus. It takes a good few years to get from the research lab to the market, so work had better get underway soon.

The tools industry is quite probably set to change, conceivably beyond all recognition.