interview with antony unwin

Computational Statistics (2005) 20:1-5

Interview with Antony Unwin

Antony Unwin is Professor of Computer Oriented Statistics and Data Anal- ysis at the University of Augsburg (Germany). He was Joint Editor of the Journal Computational Statistics till 2003 and is currently Associate Editor of the Journal of Computational and Graphical Statistiscs. He has directed many software projects in the field of interactive statistical graphics, ex- ploratory analysis and analysis of large data sets. T h e interview was made by the managing editor, Prof. Dr, W. H~dle, during the CompStat 2004 conference in August 2004 in Prague (Czech Republic).

Statistical science has many roots. Where do you see the roots of Computa- tional Statistics ?

Computational Statistics has its roots in the need to analyse data, to evaluate models. We have had some very sophisticated theoretical developments through the 20th century, but many of those models could never be evaluated. If you look at early papers on multivariate statistics you find that the authors would love to know how their models would work out, but they weren't able to do it in those days. Nowadays we can evaluate not only those models, but a whole range of new models and this has led to further developments in Computational Statistics.

What was the most prominent development for Computational Statistics in the last five, ten, twenty, thirty years?

Over the last thirty years there have been many dramatic changes in computing, both in software and in hardware. Starting furthest back probably the improvements in matrix computations had the most effect in statistics, enabling us to evaluate linear models - although the proper correct evalua- tion took some time initially. Then the arrival of desktop computing enabled us to put statistics on everybody's desktop. Initially it was only on a few

Figure 1: Antony Unwin with his former students Heike Hofmann (left), Stu- dent's Paper Award winner of the ASA Section Computational Statistics in 2000 and Sylvia Winkler (right), winner of the John M. Chambers Statistical Software Award in 2000.

people's desktops, but to-day it's not even on desktops it's on laptops. This has really spread the use of statistics, even if it hasn't quite yet spread the use of correct statistics. And in my own field I think the increasing graphics capabilities that are available now have made dramatic improvements in how we analyse data. Look at old papers from something like the late fifties, how people were delighted just to be able to put a few marks on paper with the computer. The capabilities we have now allow us unbelievable sophistication, though it may take a while before we use them to full effectiveness. This has certainly made a great difference to statistics and in particular Computa- tional Statistics, because all these graphics have to be computed, it's not just a matter of drawing them. On the more traditional computational side I think a major influence has been the development of statistical languages. Initially we had individual models, then we had collections of models, if one can describe some of the big packages that way. But now we have genuine languages, whether we are talking about S, XploRe or R. These allow ex- perienced expert statisticians a tremendous degree of extra flexibility which more traditional packages haven't given us. I think this also hasn't been fully exploited yet and it is something we can look forward to in the future. When you are talking about computers you have to think of the power that

is now at our disposal. Increases in speed and capacity enable us to calcu- late models that we never dreamed of considering before. It 's fascinating for someone like myself who was educated in the days when these models were a nice theoretical idea, but at that time totally impractical.

Why would you classify yourself as a computational statistician?

I would classify myself more as an applied statistician. Of course if you are an applied statistician you are working with real data sets, you are analysing data, you have to use computers and you have to use them effectively and for that Computational Statistics is essential.

Some people say that as a computational statistician one needs mathematics and C++. Is this reflected in today's education?

I hope not and I don't agree that you need C++! You certainly need knowl- edge of mathematics and computer science, but possibly at different levels. Computational statisticians have to be in a position to write their own rood- els. This is where today's statistical languages come in and it would be better for a computational statistician to be an expert in R than in C++.

How is Computational Statistics distinguished from a very focused field like bio-informatics ?

In any specific field of applications there are important, specialised models that have particular structures. Computational Statistics endeavours to sup- port the whole range of applied statistics and this means broader classes of models.

Is there a future for Computational Statistics, Antony?

I hope so, but it may be called something else. Academic disciplines have a habit of reinventing themselves and sometimes related work is done in quite different fields. One can think of data mining for instance, something that seems to be more associated with computer science than statistics despite the major role statistics plays there. The aims of Computational Statistics, to provide statisticians with sufficient and effective means to model data, will always be with us, and I think there always will be a substantial number of new problems which Computational Statistics will have to solve.

How should new developments in Computational Statistics be published?

This is a big problem in my own field of interactive graphics where you can not represent what you are doing at all easily on a printed page. It 's also a problem for those developing algorithms, where they may be able to achieve a substantial speed improvement, but that is difficult to appreciate from a printed paper unless you use the algorithm yourself. With new developments in multimedia there is a real hope that we can publish results in different ways in future, although how that is going to be, via e-books, via client- server systems is at this stage a little bit difficult to tell, but there is a lot of

4

Figure 2: Software packages developed at the Dept. of Computer Ori- ented Statistics and Data Analysis of the University of Augsburg, see http://stats.math.uni-augsburg.de/software/

experimentation going on. I am optimistic we will have a stronger position for Computational Statistics in five or ten years time, once the publication issue has been sorted out.

How did you become a computational statistician?

I took time out from university at one point to spend a sabbatical with a firm and I discovered that their problems were not necessarily sophisticated statistical models, bu t simply to collate and represent their data in an effective way. What became apparent was that the software tools available were not good enough to do this. And so I got interested in new ways of collating data and representing data, and in particular in working directly with the results, which is where my interests in interactivity developed. Later on, other aspects of Computational Statistics became interesting as well. I actually got a job with the title Computational Statistics when I moved to Germany in 1993, but I had already set up a group to develop statistical software in Dublin for a few years before that and I think I had become a computational statistician by the end of the eighties.

What do you think of to-day's statistical software packages?

My starting principle is that all statistical software indeed all software pack-

ages, are not nearly as good as I 'd like them to be. This applies to word processing packages, internet browsers, mail programs and, of course, statistical software. That is perhaps unfair when you see the huge improvements that have taken place in the last few years. We have much better numerical reliability, we have much more flexible packages, we have much more sophisticated graphics packages, but the interfaces to use these packages are still clumsy. There is an old paper that I looked up a few days ago, reporting on a discussion meeting in the sixties on statistical software packages. Some participants were worried about making programmes too easy to use because then people would misuse them. My attitude is that we should make them as easy as possible, so that the experts can use them more easily. The fact that others are going to misuse statistical packages will happen no matter what we do.

Where do you see the big job chances and job market for statisticians?

I 'd like to say that the big job opportunities lie in the analysis of large data sets whether they are the huge data sets that central statistical offices now have, or those of the weather bureaus or of financial organisations or of any large companies. But it could be that computer scientists are going to be ahead of computational statisticians in the queue. That would be a great pity, because although computer scientists bring considerable technical skills to these tasks, the problems of statistical interpretation remain, and indeed have gradually changed with the analysis of large data sets. This is something that I find important. The classical statistics we teach is very well developed for small data sets, but doesn't apply in that way to the large data sets we have to analyse to-day. I am optimistic that as these problems emerge it will be the role of computational statisticians to solve them.

interview with antony unwin

Documents