applied statistics 2008 - avcr.cz · nejc berzelak, vasja vehovar and katja lozar manfreda...

87
International Conference APPLIED STATISTICS 2008 PROGRAM and ABSTRACTS September 21 – 24, 2008 Ribno (Bled), Slovenia Organized by Statistical Society of Slovenia Supported by Slovenian Research Agency (ARSS) Statistical Office of the Republic of Slovenia ALARIX d.o.o. RESULT d.o.o. VALICON d.o.o. Johannes van Kessel Publishing Wiley-Blackwell Slovenian Tourist Board statistics.com

Upload: danglien

Post on 24-Feb-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

International Conference

APPLIED STATISTICS2008

PROGRAM and ABSTRACTS

September 21 – 24, 2008

Ribno (Bled), Slovenia

Organized byStatistical Society of Slovenia

Supported by

Slovenian Research Agency (ARSS)Statistical Office of the Republic of Slovenia

ALARIX d.o.o.RESULT d.o.o.

VALICON d.o.o.Johannes van Kessel Publishing

Wiley-BlackwellSlovenian Tourist Board

statistics.com

Page 2: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

CIP - Katalozni zapis o publikacijiNarodna in univerzitetna knjiznica, Ljubljana

311(063)

INTERNATIONAL Conference Applied Statistics (2008; Ribno)Program and abstracts / [Elektronski vir]/International Conference Applied Statistics2008, September 21–24, 2008, Ribno (Bled), Slovenia ;[organized by Statistical Society of Slovenia ; edited by Lara Lusa and Janez Stare].- Ljubljana : Statistical Society of Slovenia, 2008Nacin dostopa (URL): http://conferences.nib.si/AS2008/AS2008-Abstracts.pdf

ISBN 978-961-92487-1-31. Applied Statistics 2. Lusa, Lara 3.Statisticno drustvo Slovenije240964096

Page 3: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Scientific Program Committee

Janez Stare (Chair), Slovenia Tomaz Banovec, SloveniaVladimir Batagelj, Slovenia Jaak Billiet, BelgiumMaurizio Brizzi, Italy Brendan Bunting, Northern IrelandAnuska Ferligoj, Slovenia Herwig Friedl, AustriaDario Gregori, Italy Katarina Kosmelj, SloveniaDagmar Krebs, Germany Irena Krizman, SloveniaLara Lusa, Slovenia Mihael Perman, SloveniaJohn O’Quigley, France Joze Rovan, SloveniaTamas Rudas, Hungary Willem E. Saris, The NetherlandsAlbert Satorra, Spain Vasja Vehovar, SloveniaHans Waege, Belgium

Organizing Committee

Andrej Blejec (Chair) Bogdan GrmekLara Lusa Katja RostoharIrena Vipavc Brvar

Published by: Statistical Society of SloveniaVozarski pot 121000 Ljubljana, Slovenia

Edited by: Lara Lusa and Janez StarePrinted by: Statistical Office of the Republic of Slovenia, Ljubljana

Page 4: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano
Page 5: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Program

Page 6: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Program Overview

Hall 1 Hall2Sunday, 10.30 – 11.00 Registration21 September

11.00 – 11.10Opening of the

Conference11.10 – 12.00 Invited Lecture12.00 – 12.20 Break

12.20 – 13.40Social Science Statistical ApplicationsMethodology I Economics I

13.40 – 15.00 Lunch

15.00 – 16.20Social Science Statistical ApplicationsMethodology II Biostatistics I

16.20 – 16.40 Break

16.40 – 18.00Network Analysis I Statistical Applications

Biostatistics II19.00 Reception

Monday, 9.10 – 10.00 Invited Lecture22 September 10.00 – 10.20 Break

10.20 – 11.40Biostatistics and Statistical ApplicationsBioinformatics I Social Sciences

11.40 – 12.00 Break

12.00 – 13.20Biostatistics and Modeling andBioinformatics II Simulation

13.20 – 14.30 Lunch14.30 Excursion

Tuesday, 9.10 – 10.00 Invited Lecture23 September 10.00 – 10.20 Break

10.20 – 12.00 Mathematical Statistics Design of Experiments12.00 – 12.20 Break12.20 – 13.40 Econometrics Data Mining I13.40 – 15.00 Lunch

15.00 – 16.20Statistical Applications

Economics IIWednesday, 9.10 - 10.30 Clustering Teaching Statistics24 September 10.30 – 10.50 Break

10.50 – 12.10Measurement Network Analysis II

and Data Mining II

12.10 – 12.30Closing of the

Conference

6 Applied Statistics 2008

Page 7: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

SUNDAY, September 21, 2008

10.30 – 11.00 REGISTRATION

11.00 – 11.10 OPENING OF THE CONFERENCE

11.10 – 12.00 INVITED LECTURE (Hall 1) Chair: Vasja VehovarJaak Billiet (Belgium)Non-Response Bias in Cross-National Surveys: Designs for Detection and Adjustmentof Bias in the European Social Survey

12.00 – 12.20 BREAK

12.20 – 13.40 Social Science Methodology I (Hall 1) Chair: Jaak Billiet

1. Effect of Background, Attitudinal and Social Network Variables on PhD Stu-dents’ Academic Performance. A Multimethod ApproachLluıs Coromina, Aina Capo, Jaume Guia and Germa Coenders (Spain)

2. How to Valuate Attained Levels of Education?Jakub Fischer and Petr Mazouch (Czech Republic)

3. Dis/similarities of Students Gradings DistributionsMatevz Bren and Darko Zupanc (Slovenia)

4. How Many Clicks Do We Need to Create a Web Survey?Nejc Berzelak, Vasja Vehovar and Tina Horvat (Slovenia)

12.20 – 13.40 Statistical Applications - Economics I (Hall 2) Chair: Joze Rovan

1. A Nonlinear Mixed Effects Model for Prediction of Natural Gas Consumptionby Individual CustomersMarek Brabec, Ondrej Konar, Emil Pelikan and Marek Maly (Czech Republic)

2. Modelling Time Series of Wheat Prices in SerbiaEmilija Nikolic-Djoric and Beba Mutavdzic (Serbia)

3. Statistic Model for DSS in Construction IndustryElena Posdarie (Romania)

4. Empirical Analysis of Bivariate GARCH Models in Serbian Emerging Finan-cial MarketJelena Minovic (Serbia)

13.40 – 15.00 LUNCH

Applied Statistics 2008 7

Page 8: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

SUNDAY, September 21, 2008

15.00 – 16.20 Social Science Methodology II (Hall 1) Chair: Lluıs Coromina

1. What Can We Achieve With 5 Euros? Optimization of Survey Data QualityUsing Mixed-Mode ApproachesNejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia)

2. Well-being in the Slovenian MunicipalitiesLea Bregar, Kaja Malesic and Joze Rovan (Slovenia)

3. Clustering of Population PyramidsSimona Korenjak-Cerne, Natasa Kejzar and Vladimir Batagelj (Slovenia)

4. Comparison of Cluster Analysis and Kohonen Neural Network Clustering inthe Exploration of Relationships Between Crime Rates and Social Conditionsin European CountriesMiran Mitar and Igor Belic (Slovenia)

15.00 – 16.20 Statistical Applications - Biostatistics I (Hall 2) Chair: Gaj Vidmar

1. Reading and Interpreting Point Variations in Old Ages Death Probabilities witha Comparison Between Recent Italian and Slovenian DataMaurizio Brizzi, Rosella Rettaroli and Giulia Roli (Italy)

2. Detection of Outliers and Influential Observations in the Linear ModelSemra Turkan and Oniz Toktamis (Turkey)

3. Statistical Evaluation of Earthquake Data Using Point ProcessUmay Uzunoglu Kocer and Esin Firuzan (Turkey)

4. An Investigation on Differentials of Factors Affecting Fertility Rate AmongLow, Middle and High Income CountriesDilip Kumar Mondol and Paul S.F. Yip (Hong Kong)

16.20 – 16.40 BREAK

16.40 – 18.00 Network Analysis I (Hall 1) Chair: Andrej Mrvar

1. Measuring Ties on Online ForumsAles Ziberna, Vasja Vehovar and Aleks Jakulin (Slovenia)

2. Risk Behaviour and Party Networks of Young Adults - A Cross-Sectional Studyin Nine European CountriesLuka Kronegger and Matej Kosir (Slovenia)

3. Using Social Network Methods in Analysis of Other Types of Data OrganizedAs 2–Mode NetworkNino Rode, Jelka Skerjanc and Ales Ziberna (Slovenia)

8 Applied Statistics 2008

Page 9: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

SUNDAY, September 21, 2008

4. Consistency of Social Support Sources Assessed by the Name Generator Ap-proach and Role Relation ApproachValentina Hlebec and Tina Kogovsek (Slovenia)

16.40 – 18.00 Statistical Applications - Biostatistics II (Hall 2) Chair: Maurizio Brizzi

1. The Unemployment Structure of Turkey: Survival Models with Nonpropor-tional HazardsErengul Ozkok, Nihal Ata and Ugur Karabey (Turkey)

2. The Comparison of Partial Least Squares Regression and Principal ComponentRegression with Multiple Linear Regression on An Air Pollution DataEsra Polat and Suleyman Gunay (Turkey)

3. Reliability Properties Related to Friday and Patil’s Bivariate Exponential ModelJuana-Marıa Vivo and Manuel Franco (Spain)

19.00 RECEPTION

Applied Statistics 2008 9

Page 10: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

MONDAY, September 22, 2008

9.10 INVITED LECTURE (Hall 1) Chair: Janez StareØrnulf Borgan (Department of Mathematics, University of Oslo, Norway)Dynamic Models for Survival and Event History Data

10.00 – 10.20 BREAK

10.20 – 11.40 Biostatistics and Bioinformatics I (Hall 1) Chair: Ørnulf Borgan

1. Evaluation of Reduced Rank Semiparametric Models to Assess Excess of Riskin Cluster AnalysisMarco Geraci (United Kingdom) and Andrew B. Lawson (U.S.A.)

2. Weighted Estimation in Cox RegressionSamo Wakounig, Georg Heinze and Michael Schemper (Austria)

3. Goodness-of-fit of Semiparametric Additive Regression Models in Relative Sur-vivalGiuliana Cortese (Italy) and Thomas H. Scheike (Denmark)

4. A Simulation Study on Power Comparisons for Group Sequential Tests of Non-Parametric Statistics in Case of Non-Proportional HazardsYaprak Parlak Demirhan, Haydar Demirhan and Sevil Bacanli (Turkey)

10.20 – 11.40 Statistical Applications - Social Sciences (Hall 2) Chair: Luka Kronegger

1. The Changes of the Working Realization of the Graduates from Bari UniversityAccording to TimeFrancesco Campobasso and Annarita Fanizzi (Italy)

2. Statistical Analysis of Marital Status of the Population of VojvodinaKatarina Cobanovic, Valentina Sokolovska and Slobodan Nicin (Serbia)

3. The Elliptical Model of Multicollinearity and the Petres Red IndicatorPeter Kovacs and Tibor Petres (Hungary)

11.40 – 12.00 BREAK

10 Applied Statistics 2008

Page 11: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

MONDAY, September 22, 2008

12.00 – 13.20 Biostatistics and Bioinformatics II (Hall 1) Chair: Lara Lusa

1. Power of Resequencing Studies: A Group-Wise Association Test for Rare Dis-ease Susceptibility MutationsBo Eskerod Madsen (Denmark)

2. Multinomial Classification Models in BiostatisticsVoicu Boscaiu, Daniela Bratosin and Manuela Sidoroff (Romania)

3. Subgroup Discovery in Data Sets with Two Sets of Variables Using a Combina-tion of Clustering and Classification TechniquesLan Umek, Uros Petrovic and Blaz Zupan (Slovenia)

12.00 – 13.20 Modeling and Simulation (Hall 2) Chair: Matevz Bren

1. Introduction to GNSS-SIM AlgorithmsJana Heckenbergerova and Hana Bohacova (Czech Republic)

2. Using Partial Standard Deviations in Partial Least Squares RegressionAylin Alin (Turkey)

3. A Comparative Study on the Efficiency of Closed Multiple Testing Applied toOrdinal DataKamon Budsaba, Penkhae Siriwan and Tipaval Phatthanangkul (Thailand)

4. Application of Response Surface Methodology in the Optimization of Dust Prod-uct PackagingElizabeth Dıaz-Castellanos, C. Dıaz-Ramos, L. C. Flores-Avila, S. Heyser-Fregosoand M. Arrioja-Rodrıguez (Mexico)

13.20 – 14.30 LUNCH

14.30 EXCURSION

Applied Statistics 2008 11

Page 12: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

TUESDAY, September 23, 2008

9.10 – 10.00 INVITED LECTURE (Hall 1) Chair: Andrej Blejec

William S. Cleveland (U.S.A.)Visualization Databases for Lossless Analysis of Complex Data Sets

10.00 – 10.20 BREAK

10.20 – 12.00 Mathematical Statistics (Hall 1) Chair: William S. Cleveland

1. Long Term Behaviour of Imprecise Birth-Death ProcessesRichard Crossman, Pauline Coolen-Schrijner, Frank Coolen (United Kingdom) andDamjan Skulj (Slovenia)

2. Generalized Chinese Restaurant Construction of Exchangeable Gibbs Parti-tions and Related ResultsAnnalisa Cerquetti (Italy)

3. The Probability Functions for the Neyman Type Processes and Thomas Process:an Application on Traffic AccidentsGamze Ozel (Turkey)

4. Application of Katz Family of Distributions for Detecting and Testing Overdis-persion in Poisson ModelsMohammad Ali Baradaran Ghahfarokhi (Iran)

10.20 – 12.00 Design of Experiments (Hall 2) Chair: Gaj Vidmar

1. Some Notes About Efficiency Balanced Block Designs With Repeated BlocksBronisław Ceranka and Małgorzata Graczyk (Poland)

2. Some Remarks About Optimum Chemical Balance Weighting Design for p =v + 1 ObjectsBronisław Ceranka and Małgorzata Graczyk (Poland)

3. A Comparison of Type I Error and Power of the Pairwise Comparisons Testunder Unequal Small Difference Variance.Krongkaew Wangniwetkul (Thailand)

4. Application of Statistical Program R for Development of Sampling Schemes forEnsuring Coexistence Measures in Maize FieldsKatja Rostohar, Andrej Blejec, Vladimir Meglic and Jelka Sustar Vozlic (Slovenia)

5. Simulation Study to Check the Performance of Various Unequal ProbabilitySampling EstimatorsNadeem Shafique Butt and Muhammad Qasier Shahbaz (Pakistan)

12 Applied Statistics 2008

Page 13: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

TUESDAY, September 23, 2008

12.00 – 12.20 BREAK

12.20 – 13.40 Econometrics (Hall 1) Chair: Alesa Lotric Dolinar

1. Insurance Consumption in Italy: a Sub-Regional Panel Data AnalysisGiovanni Millo and Gaetano Carmeci (Italy)

2. Time Series Model for Paired ComparisonsM.R. Sjolander and I.N. Litvine (South Africa)

3. Prediction of Turkish Bank Failures via Multivariate Statistical Analysis of Fi-nancial StructuresYuksel Akay Unvan (Turkey)

12.20 – 13.40 Data Mining I (Hall 2) Chair: Natasa Kejzar

1. Principal Ellipsoid AnalysisThierry Dhorne (France)

2. Modelling Traffic Flow on Network of Slovenian RoadsTine Porenta, Kurt Kalcher, Franc Svegl and Igor Grabec (Slovenia)

3. Addition of Documents’ Representations in the Latent Semantic SpaceJasminka Dobsa (Croatia)

4. Mining Association Rules from Transactional Databases and Apriori MultipleAlgorithmPredrag Stanisic and Savo Tomovic (Montenegro)

13.40 – 15.00 LUNCH

15.00 – 16.20 Statistical Applications – Economics II (Hall 1) Chair: Gaetano Carmeci

1. Evaluating the Usefulness of Information from ForecastsEric S. Lin, Ping-Hung Chou and Ta-Sheng Chou (Taiwan)

2. Application of Multiple Correspondence Analysis in Business ResearchChristine Duller (Austria)

3. Data Collection in Fast-Growing Companies: The Case of Slovenian GazellesMojca Bavdaz and Mateja Drnovsek (Slovenia)

4. Incentives for Industry-Science CollaborationLavoslav Caklovic and Sonja Radas (Croatia)

Applied Statistics 2008 13

Page 14: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

WEDNSDAY, September 24, 2008

9.10 – 10.30 Clustering (Hall 1) Chair: Anuska Ferligoj

1. Cluster Analysis of Phytoplankton with Similar Temporal and Spatial PatternsSangdao Wongsai and Kehui Luo (Australia)

2. Agglomerative Hierarchical Methods: Introduction and Problem settingKristijan Breznik, Branka Golob and Mojca Cizek Sajko (Slovenia)

3. Simulated Data StructuresRok Blagus, Nusa Erman and Emil Polajnar (Slovenia)

4. The Performance of Selected Hierarchical Agglomerative MethodsAles Korosec, Sanja Filipic, Tina Ostrez and Jana Suklan (Slovenia)

9.10 – 10.30 Teaching Statistics (Hall 2) Chair: Andrej Blejec

1. Improvements in Teaching Statistics in Slovenian Secondary SchoolsAndreja Drobnic Vidic and Simona Pustavrh (Slovenia)

2. Statistics101: an Extended Implementation of the Resampling Stats LanguageJohn Grosberg (U.S.A.) and Gaj Vidmar (Slovenia)

3. How to Research the Effectiveness of Constructivist Statistics Education? AnApproach based on Reproducible Computing.Patrick Wessa (Belgium)

4. Generating Tests Using R and LATEXLara Lusa (Slovenia)

10.30 – 10.50 BREAK

10.50 – 12.10 Measurement (Hall 1) Chair: Mojca Bavdaz

1. Inclusion of Capital Services to the Productivity MeasurementJaroslav Sixta and Jakub Fischer (Czech Republic)

2. Quality of the Measurement of Media Use on Political Issues in the ESSLluıs Coromina and Willem E. Saris (Spain)

3. Achieving Cross-National Equivalence in Survey Measurement of Tourist Sat-isfaction: Methodological Challenges and an Empirical InvestigationVesna Zabkar, Irena Ograjensek, Tanja Dmitrovic and Maja Makovec Brencic (Slove-nia)

4. Measuring the Dynamics of the Digital Divide: an Integrative ApproachVesna Dolnicar and Vasja Vehovar (Slovenia)

14 Applied Statistics 2008

Page 15: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

WEDNSDAY, September 24, 2008

11.10 – 12.10 Network Analysis II and Data Mining II (Hall 2) Chair: Ales Ziberna

1. Stability of Typologies on the Basis of Repeated Measurement with the RoleRelation(ship) and the Name Generator ApproachTina Kogovsek and Valentina Hlebec (Slovenia)

2. Cancer Classification Using Various Partitioning Clustering TechniquesParvesh Kumar and Siri Krishan Wasan (India)

3. Absolute Maximum Entropy Principle and Self-Organization of Memory CellsIgor Grabec (Slovenia)

12.10 – 12.30 CLOSING OF THE CONFERENCE

Applied Statistics 2008 15

Page 16: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano
Page 17: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Abstracts

Page 18: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano
Page 19: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Invited lecture Sunday, September 21

Invited lectureNon-Response Bias in Cross-National Surveys: Designs for Detection and Adjustmentof Bias in the European Social SurveyJaak Billiet (K.U. Leuven and Central Coordination Team of European Social Survey, Bel-gium; [email protected])

Quality assessment of the obtained data in ESS is based on a conceptual framework of apragmatic approach to data quality assessment (Loosveldt et al., 2004). It combines aspectsof the total survey error approach that are focused on output evaluation, and the total qualitymanagement approach, which is concentrated on process evaluation. This paper is focusedon process and output aspects of the obtained sample and deals with the measurement ofnon-response, and the study of non-response bias from a viewpoint of comparative researchin which the concept of equivalence in measurement is central (Jowell et al., 2007). The pa-per starts with a theoretical reflection on several designs for the detection of non-responsebias: comparing sample statistics with population statistics; using information from re-luctant respondents based on converted refusals; asking a small set of crucial questions atoccasion of first contact (and refusal) or in a period after the main survey, and collectinginformation of the live environment of the sampling units. Each of these methods wereused in the past three round of ESS. In a second part, problems related to each of thesemethods are considered (e.g. obstacles from privacy rules), and the application of each ofthe procedures is evaluated using empirical data of past ESS surveys. In the third part ofthe paper, methods that can be used for data based adjustment of the sample measures areconsidered. The presentation concludes with a costs and benefits analysis of the discussedmethods.

ReferencesBilliet, J., Phillipens, M., Fitzgerald, R. and Stoop, I. (2007). Estimation of NonresponseBias in the European Social Survey: using Information from Reluctant Respondents. Jour-nal of Official Statistics, vol. 23 (2), pp. 135-162.Jowell, R., Kaase, M. Fitzgerals, R. and Eva, G. (2007). The European Social Survey asa measurement model. Pp. 1-32 in: Jowell, R., Roberts, C., Fitzgerald, R., and Eva, G.(Eds.). Measuring Attitudes Cross-Nationally. Lessons from the European Social Survey.London: Sage.Loosveldt, G., A. Carton and J. Billiet (2004). Assessment of survey data quality: a prag-matic approach focused on intersviewer tasks. International Journal of Market Research,vol. 46 (1): 65-82.

Applied Statistics 2008 19

Page 20: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Social Science Methodology I

Social Science Methodology IEffect of Background, Attitudinal and Social Network Variables on PhD Students’Academic Performance. A Multimethod ApproachLluıs Coromina, Aina Capo (Department of Economics, Faculty of Economics, Universityof Girona, Girona, Spain; [email protected], [email protected])Jaume Guia (Department of Organization, Business Management and Product Design, Fac-ulty of Tourism, Girona, Spain)Germa Coenders (Department of Economics, Faculty of Economics, University of Girona,Girona, Spain)

The aim of this paper is to predict the academic performance for PhD students in the Uni-versity of Girona (Spain). In this presentation we use a multimethod approach. First, weuse a quantitative study to know what variables predict performance in the University ofGirona. Then, we use a qualitative study to attempt to understand some unexpected resultsfrom the previous quantitative research. This study belongs to a wider project designedto predict PhD students’ academic performance carried out by the INSOC (InternationalNetwork on Social Capital and Performance). The INSOC research group is composedby the universities of Girona (Spain), Ljubljana (Slovenia), Giessen (Germany) and Gent(Belgium).In the quantitative study, the explanatory variables are characteristics of PhD students’research group understood as a social network and background and attitudinal character-istics of the PhD students and their supervisors. Academic performance was measured bythe weighted number of publications and conference presentations. We specify a separateregression model for each type of variable. Then we combine them, we find that only back-ground variables are useful predictors of PhD student academic performance. In particular,a high seniority and a high performing supervisor increase student performance. Having toteach and having children decrease it. The literature on PhD student success stresses theimportance of the research group and social networks. We expect that a qualitative studycan uncover the reasons why the quality of the network fails to translate into the quality ofthe student’s work.The goal of the qualitative study is to understand the PhD students’ point of view, theirfeelings and perspectives about their performance and to know what or who helped them intheir research performance and what or who made the research performance difficult withintheir network. We collected data using in-depth interviews. We used extreme/deviant casesampling and typical case sampling. Both techniques are designed to find cases that bestilluminate the research question at hand. We used the program Atlas.ti to analyze de inter-views.The main results confirm the importance of background and situation of the PhD student.For instance, not having the PhD as a main task hindered students from publishing, by de-voting too much time to teaching or to other non-academic work. As regards the networkvariables, the qualitative research shows that they are important for the students. Of the 24

20 Applied Statistics 2008

Page 21: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Social Science Methodology I Sunday, September 21

aspects helping to publish that students mentioned in the interviews, 19 had to do with theirsupervisors, their research group or their network as a whole. Similarly, of the 14 men-tioned hindrances 6 had to do with the network. The most commonly mentioned networkrelated topics are belonging to a sufficiently large research group, research group memberspushing them to publish, meeting researchers outside the research group, existence of otherPhD students in the group, and frequent contact with the supervisor and research groupmembers.

How to Valuate Attained Levels of Education?Jakub Fischer and Petr Mazouch (Faculty of Informatics and Statistics, University of Eco-nomics, Prague, Czech Republic; [email protected], [email protected])

The modern society more and more demands for measurement, quantification and valuationof many social and economic processes, their results and mutual relationships. One of themost important factors, which influences the level and changes of e. g. gross domesticproduct, labour productivity, unemployment and wages, is the education attainment.In many analytical studies focused on the relationship between education and mentionedindicators the education is considered only by the highest attained level of education andthe structure of population by the attained level of education. This approach has at leasttwo disadvantages. Firstly, the educational systems are very different among countries.For example, some tertiary education institutions in the U.S.A. (colleges) are practically atthe same level as some secondary education institutions in the Czech Republic. Secondly,some advanced statistical methods (e. g. regression analysis and some other multivariatemethods) need quantitative indicators and variables.For this reasons, the paper is focused on the possibilities of valuation of the educationalattainment as the quantitative indicator. In our preceding research we have made some ex-perimental calculations of valuation of the educational levels as well as of the relationshipwith other indicators on the qualitative and quantitative base. The result of our method-ology is the quantitative variable. In this submitted paper, we present our methodologyof valuation in a complex form and also the comparison with other methodologies. Thiscomparison is based on calculations using empirical data from the Czech Republic.

Dis/similarities of Students Gradings DistributionsMatevz Bren (Faculty of Organizational Sciences, University of Maribor and Institute ofMathematics, Physics and Mechanics, Ljubljana, Slovenia; [email protected])Darko Zupanc (National Examinations Centre, Ljubljana, Slovenia;[email protected])

In our talk we will discuss methods for comparing students achievements i.e. appropriately

Applied Statistics 2008 21

Page 22: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Social Science Methodology I

defined dis/similarity measures of students gradings distributions to compare

• students achievements in one subject or in one class/school with the other/s and

• students gradings distributions with the pre selected one (symmetric or...)

We intend these comparison to be applied in the ALA Tool i.e. Assessment of/for Learn-ing Analytic Tool for gathering and analysing of grades in upper secondary education inSlovenia.

ReferencesYossi, R., Tomasi, C. and Guibas, L.J. (2000) The Earth Movers Distance as a Metric forImage Retrieval. International Journal of Computer Vision, 40(2), 99121.Zupanc, D. and Bren, M. (2007) Public Examination System as a Support for Teachingand Learning Improvement in Slovenia Assessment for Learning Analytic Tool. Paperpresented at the 33rd Annual Conference IAEA International Association for EducationalAssessment, Baku, Azerbaijan.

How Many Clicks Do We Need to Create a Web Survey?Nejc Berzelak, Vasja Vehovar and Tina Horvat (Faculty of Social Sciences, University ofLjubljana, Ljubljana, Slovenia; [email protected], [email protected])

Software applications for preparation and implementation of web surveys have become anincreasingly important component in survey research. However, feature-rich software re-quires relatively high amount of learning and usually involve complex procedures to gener-ate survey questions, while easy-to-use applications usually offer too few features to satisfythe needs of advanced users.We evaluate web survey tools from the perspective of a scientific user. For this pur-pose, a variety of usability and benchmarking procedures is employed on most widespreadsoftware tools, including OneClickSurvey solution, developed ad University of Ljubljana(http://1ka.si). This includes the evaluation of a) required features for various researchpurposes (like simple and complex questionnaires, experimental designs and mixed-moderesearch), b) availability and accessibility of these features in selected tools, c) easiness andintuitiveness of their utilization and d) estimated time needed for mastering the software toa required degree. Two standardized surveys have been implemented for this purpose in allselected software tools.The paper provides (1) an insight into the state-of-the-art in the field, (2) guidance for selec-tion of the software for different needs and discusses some possible directions of the futuredevelopment and (3) comparison of results arising from different strategies for evaluationand testing.

22 Applied Statistics 2008

Page 23: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Statistical Applications - Economics I Sunday, September 21

Statistical Applications - Economics IA Nonlinear Mixed Effects Model for Prediction of Natural Gas Consumption by In-dividual CustomersMarek Brabec, Ondrej Konar, Emil Pelikan and Marek Maly (Institute of Computer Sci-ence, Academy of Sciences of the Czech Republic, Praha, Czech Republic;[email protected])

In this paper, we will propose a nonlinear statistical model, useful for description and pre-diction of daily consumption of natural gas at the level of individual customers. Unlike intraditional approaches (which typically concentrate on consumption sums for rather largegroups of customers), we have to deal with individual consumption series irregularitiesposed by inter-individual heterogeneity here. Heterogeneity is substantial even within cus-tomer classes created by gas companies in order to “homogenize” through grouping ofindividuals with similar “structural” time-invariant characteristics available from routinedatabases (e.g. medium size, commercial, i.e. non-residential customers with declared gasheating which we use in this work). Even within the same customer, there are substantialirregularities including zeros, missing data, abrupt consumption pattern changes, etc.Our model is of nonlinear regression type with individual customer-specific parametersthat have nevertheless a common distribution. In other words, we work within the nonlin-ear mixed effects model (NLME) framework. This means that we do have a “systematic”and “autocorrelated” parts, but the later is highly structured, nonlinear and non-stationary,i.e. of quite different nature than is typical in standard time series modeling. Both fixedand random effects’ structure is specified, so that we capture large individual discrepan-cies in temperature-related and temperature-free parts of total consumption. We utilizedaily temperature average and day of week type as covariates (since these are readily avail-able and correspond to the most important predictors). It turns out that it is advantageousto build the model conditionally: first we condition on whether a particular customer con-sumed or not (having zero consumption for the particular day), then we model (and predict)consuming/non-consuming status in individual (and obviously time-varying) fashion.After describing the model formally, we will test its prediction abilities on a randomly se-lected group of Slovak customers whose daily measurements are available. In particular,we will predict up to one calendar month ahead (in relation to a practical problem arisingin SPP, Slovak Gas Distribution Company). To be able to see advantages and weak pointsof our approach, we will compare the model results with two more traditional approaches:ARIMAX and ARX (both done for each customer separately but having common structurefor all individuals and both having temperature and day of week type as external covariateinformation). As the performance comparison of all three yields interesting general insides,it will be compared and discussed in detail.

Applied Statistics 2008 23

Page 24: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Statistical Applications - Economics I

Modelling Time Series of Wheat Prices in SerbiaEmilija Nikolic-Djoric and Beba Mutavdzic (Faculty of Agriculture, University of NoviSad, Novi Sad, Serbia; [email protected], [email protected])

The paper discusses the issue of modelling univariate time series of the average, sellingand ransoming wheat prices in Serbia. The analysis was done on the base of monthly datafor the period 1999-2006. The modelling and forecasting of wheat prices is of the greatimportance as it is well known that wheat prices influence other corn prices and the pricesof other agricultural products.Several types of parametric models which can deal with seasonality and nonstationaritywere considered. Seasonality was modelled as deterministic (dummy variables and peri-odic with unchanged periodicity), stationary stochastic process and as nonstationary by im-posing seasonal unit roots. In order to find out if the seasonality is deterministic, stochasticor the mixture of them, Hylleberg-Engle Granger-Yoo (HEGY) and Canova-Hansen (CH)tests was applied.Evaluation of models was done by means of diagnostic tests, information criterions andchecking reliability of forecasts.

ReferencesCanova, F., Hansen, B.E. (1995). Are seasonal pattern constant over time? A test for sea-sonal stability. Journal of Business & Economic Statistics, 13(3), 237-252.Heij, C., de Boer, P. Franses, P. H., Kloek, T., van Dijk, H. K. (2004). Econometric Methodswith Applications in Business and Economics. Oxford University Press Inc., New York.Nikolic-oric, E., Novkovic N., Rodic ,V., Aleksic Lj. (1993), Izbor adekvatnog modela upredvianju pariteta cena svinje-kukuruz, Agroekonomika 22, 111-122.Wang, D., Tomek, W. G. (2007), Commodity prices and unit root tests, American Journalof Agricultural Economics, 89 (4), 873-889.

Statistic Model for DSS in Construction IndustryElena Posdarie (Academy of Economic Studies, Bucharest, Romania;[email protected])

The paper focusses on the advantages of Decision Support Sysytems (DSS) and also on theopportunity and the need of designing such a system for Romanian construction domain.First part presents specific problems to be solved in an informational system supportingconstruction processes nowdays and the importance of the constuction price indeces indecision making process in construction. In the end a statistical model for DSS and itsadvantages are presented.The term construction covers a wide variety of activities, these include the construction ofdwellings, non-residential buildings, and civil engineering works such as roads, bridges,

24 Applied Statistics 2008

Page 25: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Statistical Applications - Economics I Sunday, September 21

dams, etc. Construction activity also encompasses repair, renovations, rehabilitation andmaintenance of existing structures, etc. The demand for price indices for construction ac-tivity arises from the need to assess real changes in the output from these activities (i.e. tocreate a constant value series) which cannot be derived solely through reference to regularbuilding and construction statistics.Given the complexity of specific economic processes in conjuction with construction in-dustry, like rental, leasing and insuring, it seems appropiate to develop a DSS (DecisionSupport System) to assist processes like evaluation, estimation, investment decision and soon. The DSS wich best fits the requirements is data driven, statistic model based and websystem (communication driven DSS). The system involve managing lots of data and therewould be many organizations interested in using such a system - construction contractors,banks, insurance companies and, why not, the Gouvenment.There are two Intitutes compiling construction price indexes for Romania, National Insti-tute of Statistics and National Institute For Building Research. In this paper I am presentingpossible ways to improve the currently used models.The statistic model I am suggesting is basically a Standard Factors Breakdown Method. Asbreakdown method, its starting point is a list of carefully specified factors or components,from which the total input or output costs of a building or construction project are built up.For any given year a representative construction (or small number of projects) is selectedand the quantities of each factor used to build it (e.g. materials, labour, transport, machin-ery, etc.) evaluated. Changes in the costs of construction are determined by monitoring thecost of each factor. The representative building or construction chosen initially is used onlyto establish the weights.I have also built a test case system for construction price indeces for urban residential build-ings.

Empirical Analysis of Bivariate GARCH Models in Serbian Emerging Financial Mar-ketJelena Minovic (Belgrade Banking Academy, Faculty for Banking, Insurance and Finance,Union University, Belgrade, Serbia; [email protected] )

The goal of this paper is to present empirical analysis of modelling multivariate volatilityprocesses. To reduce the computation to manageable scales, we will consider bivariatetime series models. Bivariate GARCH models which will be covered in this paper are re-stricted version of BEKK (named after Baba, Engle, Kraft and Kroner, 1995), model, thediagonal VEC (DVEC, initially due to Bollerslev, Engle and Wooldridge, 1988) model andConditional Correlation Model (CCC, Bollerslev, 1990). In order to estimate the bivariateGARCH models with EViews, we will have to write new subprograms for each of analyzedmodels. We illustrate our approach by applying it to daily returns (daily returns are mea-sured by log-differences of closing prices) of the BELEX15 index and Hemofarm stock,

Applied Statistics 2008 25

Page 26: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Statistical Applications - Economics I

report the results obtained, and compare them with the restricted version of BEKK, DVECand CCC representations. The methods for estimation parameters which we intend to useare maximum log-likehood (in BEKK and DVEC models) and two-step approach (in CCCmodel). We want to test how the covariances between chosen securities and how volatilitiesfor each security move over time in bivariate case.

26 Applied Statistics 2008

Page 27: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Social Science Methodology II Sunday, September 21

Social Science Methodology IIWhat Can We Achieve With 5 Euros? Optimization of Survey Data Quality UsingMixed-Mode ApproachesNejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Faculty of Social Sciences, Uni-versity of Ljubljana, Ljubljana, Slovenia; [email protected]; [email protected])

Contemporary survey industry faces two highly challenging problems: declining responserates and growing costs of data collection. The (re-)balancing of data quality and coststhus presents one of the central research efforts of modern survey methodology. This be-came even more apparent with the rise of web surveys, which offered a new potential forthe cost reductions. However, the problem of non-coverage and even lower response ratesthan in traditional survey modes are the key factors preventing a higher penetration of theweb mode into official and academic research of the general population. Investigationsof mixed-mode designs and mechanisms for improving response rates (like incentives) arethus very important for a cost-effective and high-quality survey data collection.This paper presents the results of a study on the optimal integration of the web mode intoa probability sample survey of the general population in Slovenia. For the purpose of thestudy, we conducted an experiment in Slovenian implementation of an official Eurostatsurvey on the ICT use in households using a mixed-mode design. A probability sampleof Slovenian citizens was divided into several experimental groups, manipulating differentsurvey modes (web, telephone and mail) and different types of incentives (5 euros cashincentive, a small gift or no incentive). Individuals were initially invited to complete thequestionnaire on the web. Those who did not respond were subsequently contacted usingone of the traditional modes (telephone or mail survey).In order to identify the optimal survey design, we developed an optimization model thattakes into account response rates, errors and costs at different stages of each experimentaldesign. Errors are here estimated using the Mean Squared Error (MSE) approach. Thepaper evaluates the results obtained by the model and discusses the effectiveness of mixed-mode surveys and incentives for achieving the optimal balance between errors and costs.

Well-being in the Slovenian MunicipalitiesLea Bregar, Kaja Malesic and Joze Rovan (Faculty of Economics, University of Ljubljana,Ljubljana, Slovenia; [email protected], [email protected], [email protected])

Well-being is a complex multidimensional concept defined as a state of being happy, healthyand prosperous. It consists of many components. A significant part of well-being is thestandard of living such as the disposable income and access to goods and services. Thesecomponents can be objectively measured. The evaluation of some other components like

Applied Statistics 2008 27

Page 28: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Social Science Methodology II

health, freedom, happiness, security, quality of environment, innovation, etc. is subjectiveto some extent.Generally speaking, well-being is geographically not evenly distributed within a country.Major differences in well-being among the territorial units at subnational level impede theprogress of the society and may cause economic, social and political problems. The aim ofthis paper is to identify geographical differences of well-being in Slovenia. Municipalitieswere selected as basic units since presently they are the only types of subnational govern-ment in Slovenia. They represent the level of government closest to people, which provideslocal public services.In order to define and measure well-being we selected 49 quantitative social, economic,demographic and environmental indicators. The indicators were selected on the basis ofrelevance for well-being and data availability at municipality level.Overall well-being of 193 Slovenian municipalities was evaluated by three approaches.First, principal components analysis has been used to construct a composite indicator ofwell-being of Slovenian municipalities. Second, in order to verify the results of componentanalysis and to provide deeper insight into regional differences in Slovenia cluster analysishas been performed on the basis of all indicators. The third approach is a combination ofboth techniques where clustering is performed on the basis of major principal components.All three approaches clearly show substantial differences in the level of well-being of mu-nicipalities with prevailing higher level in the western and lower level in the eastern part ofSlovenia. Beside that, cluster analysis has revealed dual nature of the top well-being groupof municipalities. On one side, there is a subgrup of a few economically most developedurban municipalities. On the other side, well-being of the second group of municipalitiesis characterised by high standards of living and pleasant envirinmental conditions, beeinghowever a step behind on the economic field.

Clustering of Population PyramidsSimona Korenjak-Cerne (Faculty of Economics, University of Ljubljana, Ljubljana, Slove-nia; [email protected])Natasa Kejzar (Faculty of Social Science, University of Ljubljana, Ljubljana, Slovenia;[email protected])Vladimir Batagelj (Faculty of Mathematics and Physics, University of Ljubljana, Ljubl-jana, Slovenia; [email protected])

Population pyramid is a very popular presentation of the age-sex distribution of the humanpopulation of a particular region. The shape of the pyramid shows many demographic, so-cial, and political characteristics of the time and the region.In our work, the results of hierarchical clustering of the world countries based on popula-tion pyramids are presented. Special attention is given to the shapes of the pyramids. Thechanges of the pyramids shapes, and also changes of the countries inside main clusters are

28 Applied Statistics 2008

Page 29: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Social Science Methodology II Sunday, September 21

examined for the years from 1996 to 2006.Also smaller territorial units of a country can be observed through clusters. To illustratethis, clusters of 3111 mainland US counties in the year 2000 obtained with hierarchicalclustering with relational constraint are examined.

ReferencesAndreev, L. and Andreev, M. (2004): Analysis of Population Pyramids by a New Methodfor Intelligent Pattern Recognition. Matrixreasonong, Equicom, Inc.Batagelj, V., Ferligoj, A., and Mrvar, A. (2008): Hierarchical clustering in large networks.Presented at Sunbelt XXVIII, 22-27. January 2008, St. Pete Beach, Florida, USA.Kaufman, L. and Rousseeuw, P.J. (1990): Finding Groups in Data: An Introduction toCluster Analysis. Wiley, New York.Pressat, R. (1988): The Dictionary of Demography (Edited by Wilson, C.), Basil Black-well.IDB: International Data Base. http://www.census.gov/ipc/www/idbnew.html.R Development Core Team (2008) R: A Language and Environment for Statis- tical Com-puting. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org.

Comparison of Cluster Analysis and Kohonen Neural Network Clustering in the Ex-ploration of Relationships Between Crime Rates and Social Conditions in EuropeanCountriesMiran Mitar and Igor Belic (Faculty of Criminal Justice and Security, University of Mari-bor, Maribor, Slovenia; [email protected])

The main attention is given to test the hypothesis on the existence of four types of coun-tries (A- rich countries, high level of inequality, high crime; B-rich countries, low levelof inequality, low crime; C- poor countries, high level of inequality, high crime; D poorcountries, low level of inequality, low crime) in Europe regarding the relationship betweencrime rates and social conditions.The initial hypothesis is tested by use of existing secondary data (ex.: European source-book of crime Statistics and data from Human development reports) and by use of severalmethods (Cluster analysis and Kohonen neural network clustering).The results show that different methods give different results, which are not always com-plementary. Also hypothesis about existence of four types of European countries (A; B; C;D) is an oversimplified picture of more complex reality.

Applied Statistics 2008 29

Page 30: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Statistical Applications - Biostatistics I

Statistical Applications - Biostatistics IReading and Interpreting Point Variations in Old Ages Death Probabilities with aComparison Between Recent Italian and Slovenian DataMaurizio Brizzi, Rosella Rettaroli and Giulia Roli (Department of Statistical SciencesPaolo Fortunati, University of Bologna, Bologna, Italy; [email protected])

The development of human longevity is certainly a relevant and widespread topic that hasdrawn the attention of researchers in the last recent years (Kannisto, 1994, 1997; Robineand Caselli, 2005). The growing interest in the matter is mainly due to the great increasein numbers of octogenarians and centenarians occurred during the last decades, as untilrecently they were unusual or even rare (Vaupel and Jeune, 1995). A consistent part ofrecent new research on old mortality mainly points to develop particular models for theway in which the probability of dying increases with age that can also be used to predicta probability distribution for the highest age which will be attained in given historical orcontemporary populations (Thatcher et al., 1998, Thatcher, 1999).The basic idea is that the probability of dying starts to increase from soon after 30-35 yearsof age. Thatcher asserts that this probability varies from 0.001 at age 30 to 0.1 at age 80for modern males. By age 80 the rate of increase diminishes and there is controversy aboutwhat happens at higher ages still. The different models used to estimate mortality curvesat old ages refers to the various theories on the existence of a natural limit to the lengthof human life from which to derive the probable shape of the curve.As stated by Thatcher(1999) if there is a virtually fixed upper limit to the length of human life, then further fallsin death rates will make the survival curve even more rectangular and deaths will eventuallybe compressed into a narrower band of ages. In contrast, if there is no fixed upper limit, orif a fixed limit exists but is far higher than the ages which have so far been attained, thenthere will be reduced rectangularization and compression of the ages at death, but peoplemay live even longer. Among others, we recall three relevant theories, the so-called ’fixedfrailty’ model (Beard, 1971; Vaupel et al., 1979), the stochastic process model of Le Bras(1976) and the genetics theory of Mueller and Rose (1996).Given the interest posed by research on the variation of the probability of dying at old ages(namely after 65-70 years), the aim of this paper is to describe the trends of first and secondorder variations of q(x), i.e. death probabilities at age x, within two particular geographi-cal contexts (Republic of Slovenia and Emilia-Romagna, a Northern Italian region havingalmost the same area). We considered in particular the last decades, starting from 1981; asstated before, we focused our analysis on two orders of variations:

q′(x) = q(x)q(x− 1)q′′(x) = q′(x)q′(x− 1) = q(x)2q(x− 1) + q(x− 2)

We have studied such variations as a function of the age x, and we noticed an almost regularpattern for each sex, year and geographical zone: point variations do have a slight oscilla-

30 Applied Statistics 2008

Page 31: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Statistical Applications - Biostatistics I Sunday, September 21

tory trend up to a critical age, where such oscillations are increasing dramatically.The change in variation at the critical age seems quite sharper in Emilia Romagna; more-over, there have been also differences if considering one or another year. We think thatthese results deserve a certain attention in order to justify the reason of such a behaviour ofdeath probabilities. The first question to be addressed is about the existence of a commonpattern of the so called ’rate of aging’ during time, between sexes and for different regions.Next we shall investigate the trend of the second order variation of the probability of dyingthat we can interpret as the force of acceleration of the curve. As a final remark, we shall tryto give a first attempt of interpretation to the peculiar path followed by these two measures.

ReferencesBeard, R.E. (1971), Some aspects of theories of mortality, cause of death analysis, forecast-ing and stochastic processes, in W. Brass (ed.), Biological Aspects of Demography, Taylor& Francis Ltd., London.Kannisto, V. (1994) Development of Oldest-old Mortality, 1950-1990: Evidence from 28Developed Countries, Odense Monographs on Population Aging, N. 1, Odense UniversityPress, Odense (Denmark).Kannisto, V . (1997), The advancing frontier of survival, Odense Monographs on Popula-tion Aging, N. 3, Odense University Press, Odense. Le Bras H. (1976), Lois de la mortaliteet age limite, Population, 31, 655-692.Manton, K. G., Corder, L. and Stallard, E. (1997) Chronic disability trends in elderly UnitedStates populations: 1982-1994. Proc. Natn. Acad. Sci., 94, 2593-2598.Robine, J.M, and G. Caselli (2005) An unprecedented increase in the number of centenari-ans Genus LXI (1): 57-82.Thatcher A. Roger, Kannisto Vaino, Vaupel James W. (1998), The Force of Mortality atAges 80 to 120, Odense Monographs on Population Aging, N. 5, Odense University Press,Odense (Danimarca).Thatcher. R.A. (1999) The Long-term pattern of adult mortality and the highest attainedage, J . Royal Stat. Soc.(A), 162, 5-43.Thatcher, R.A. (2001) The Demography of Centenarians in England and Wales, PopulationAn English Selection, INED 13(1): 1-139.Vaupel, J. W., Manton, K. G. and Stallard, E. (1979), The impact of heterogeneity in indi-vidual frailty on the dynamics of mortality, Demography, 16, 439-454.Vaupel, J.W. and B. Jeune (1995) The emergence and proliferation of centenarians. InB. Jeune and J.W. Vaupel (eds). Exceptional longevity: from prehistory to the present,Monographs on Population Aging No. 2. Odense, Denmark, Odense University Press,pp.109-116.Vaupel, J.W. (199), The remarkable improvements in survival at older ages, Phil. Trans.Royal Soc. London, Series B, 352, 1799-1804.

Applied Statistics 2008 31

Page 32: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Statistical Applications - Biostatistics I

Detection of Outliers and Influential Observations in the Linear ModelSemra Turkan and Oniz Toktamis (Hacettepe University, Turkey; [email protected],[email protected])

The linear mixed models are considerably sensitive to outliers and influential observations.It is well known that outliers and influential observations affect substantially the results ofanalysis. So it is very important to be aware of outliers and influential observations. Inthe linear mixed models, some diagnostics have been developed in order to detect outliersand influential observations. In this paper, we have examined these diagnostics. Besides,these diagnostics have been applied to detect outliers and influential observations for thedata set taken from The Central Anatolia Forest Directorate in The Ministry of Enviromentand Forestry, in the Republic of Turkey.

Statistical Evaluation of Earthquake Data Using Point ProcessUmay Uzunoglu Kocer and Esin Firuzan (Department of Statistics, Dokuz Eylul University,Turkey; [email protected], [email protected])

Beyond the magnitude of an earthquake, as well as the geophysical features such as lon-gitude, latitude and depth, geological features of the seismic region also determine thesuccession of the earthquake occurrences. Since stochastic point processes allow us toevaluate the spatio-temporal interactions between earthquakes, these models have been be-come essential for modeling earthquake incidences, recently. The purpose of this study is topresent spatial point pattern analysis of seismic data which is gathered from Western Ana-tolia from the year 1900 to the year 2007. Inhomogeneous Poisson process model, which isbest known point process, will be established for our earthquake data and the parameters ofthe model will be estimated by maximum likelihood method. Finally, the results obtainedwill be compared homogenous Markov renewal process.

An Investigation on Differentials of Factors Affecting Fertility Rate Among Low, Mid-dle and High Income CountriesDilip Kumar Mondol (Centre of Asian Studies, The University of Hong Kong, Hong Kong;[email protected])Paul S.F. Yip (Department of Social Work and Social Administration, The University ofHong Kong, Hong Kong)

This study determines the most significant factors affecting fertility differentials among117 selected countries are equally divided into low income countries (LICs), middle in-come countries (MICs) and high income countries (HICs) according to the World BankReport (2006). It proposes to examine the fertility differentials among LICs, MICs and

32 Applied Statistics 2008

Page 33: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Statistical Applications - Biostatistics I Sunday, September 21

HICs by infant mortality rate (IMR), life expectancy at birth (LEB), female activity rate(FAR), female literacy rate (FLR), unemployment rate (UER), population below povertyline (PBP), urban population percentage (UPP), GDP per capita, contraceptive prevalencerate (CPR), age at first marriage (AAM) and religious influences (RI). Depending on WorldFactbook (2006) database, descriptive, correlation, factor and regression analysis are usedto investigate the factors of fertility differential. Mean of TFR 4.72 and 2.29 with standarddeviations 1.48 and 0.70 were among LICs and MICs, while the mean of TFR in HICswas 1.82 with standard deviation 0.59 have found by descriptive analysis. High fertilityis more likely among the countries which have high IMR, low FLR, low GDP, low LEBand strong RI influences. Regression analysis shows that IMR, FLR, CPR and LEB arethe most significant factors of fertility differentials out of all selected variables with R2 is0.805. Therefore, only four variables more than 80 percent variation in TFR has explained.

Applied Statistics 2008 33

Page 34: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Network Analysis I

Network Analysis IMeasuring Ties on Online ForumsAles Ziberna, Vasja Vehovar and Aleks Jakulin (Faculty of Social Sciences, University ofLjubljana, Ljubljana, Slovenia; [email protected], [email protected])

In this paper we discus analysis of social networks among participants of online forums(or online discussion boards). We focus on usual type of online forums where there isonly almost no direct information about which post a given post responds to. Usually, allinformation we have about a most is it’s author, time of post and the thread in which it wasposted. There exists some ”direct information” on ties between authors, the citations, butthey are relatively rarely used in on-line forums. We will however argue that we can stillinfer connections between authors based on their posts.In the paper we discuss assumptions that allow us to infer ties between authors withoutusing direct information and the networks that we obtain using them. We discuss threeoptions for defining a tie. Two persons/users are in a tie if:

1. posts reply to k posts before them in a given thread (topic). Of course, the probabilitythat a post replies to some previous posts decreases by the distance in time andnumber of posts between them (i.e. k)

2. all posts reply are in tie only the first post in the thread (topic).

3. all persons in a certain thread (topic) are in a tie.

We will also test our approach on an empirical forum and enforce our argument that tiescan be inferred without direct information by the similarity of the obtained networks withthe citation network on a subset of authors that uses them the most. We will also showsome possible uses of networks obtained in such manner.

Risk Behaviour and Party Networks of Young Adults - A Cross-Sectional Study inNine European CountriesLuka Kronegger (Faculty of social sciences, University of Ljubljana, Slovenia;[email protected])Matej Kosir (Institute for Research and Development ”Utrip”, Slovenia;[email protected])

A lot of research has been done on the topic of young adults and their peers. But with paceof emergence of new fashions, new drugs and overall changes in principles and way ofliving among young people, we believe that the question on who is shaping all this changesand what are exact indications of this changes has to be addressed. In the article we willpresent the relationship between risk behaviours of young adults and characteristics of their

34 Applied Statistics 2008

Page 35: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Network Analysis I Sunday, September 21

friendship ties. First, we present some general information on respondents and then explainthe differences in network characteristics regarding their demography. Secondly we focuson networks of people who have been carrying weapon or have been involved fights andcontinue with networks of drugs, tobacco and alcohol users. At the end we perform theregression analysis to test the influences of demography and risk behaviour on networkcharacteristics. The structure of network ties was measured on ego-centered networks andanalysed with factor analysis which was used to extract two contextual dimensions of thenetworks: presence deviance related ties and presence of socialising and help related ties.The research was performed on data gathered on representative sample of frequent usersof recreational weekend night life locations of nine European cities (Athens, Berlin, Brno,Lisbon, Ljubljana, Liverpool, Palma, Venice and Vienna) and is part of project RecreationalCulture as a Tool to Prevent Risk Behaviours led by IREFREA (http://www.irefrea.org/ ).

Using Social Network Methods in Analysis of Other Types of Data Organized As 2–Mode networkNino Rode, Jelka Skerjanc (Fakulteta za socialno delo, Univerza v Ljubljani, Ljubljana,Slovenia; [email protected] [email protected])Ales Ziberna (Fakulteta Druzbene vede, Univerza v Ljubljani, Ljubljana, Slovenia;[email protected])

Data on social networks typically consider relations between some actors or events. Nev-ertheless there are some sorts of data that have the same network structure. In such casesthe concepts and methods appropriate for social network analysis could be useful for illu-minating some aspects of the data at hand. In our paper we use the data on the tasks thathave been performed in order to achieve the goals set by the 81 people with mental healthproblems, learning or physical disabilities in their individual plans for dealing with theirproblems and improving their life. The data have been collected as a part of evaluation ofimplementation of these individual plans and are subsequently analyzed to provide a guide-line for determining the competences needed by the social workers in order to implementthis sort of individual plans in practice. Therefore the data aren’t proper social networkdata in the sense of presenting actors or actors and events. Nevertheless they certainly areorganized as a two-mode network.In our paper we will explore some possibilities of using social network concepts in ana-lyzing this sort of data. We will consider both methods for 1-mode network on the derivednetworks of tasks and goals respectively, and methods for 2-mode networks. When consid-ering different methods we will also discuss their appropriateness for the kind of data wehave.

Applied Statistics 2008 35

Page 36: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Network Analysis I

Consistency of Social Support Sources Assessed by the Name Generator Approachand Role Relation ApproachValentina Hlebec (Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia;[email protected])Tina Kogovsek (Faculty of Arts, University of Ljubljana, Ljubljana, Slovenia;[email protected])

As shown in recent research (e.g., Hlebec and Kogovsek, 2005; Kogovsek and Hlebec,2005; Kogovsek and Hlebec, 2008) name generator approach for measuring social supportprovision in ego-centered networks can be compared to role relation approach at least tosome degree. Overall evaluation of support network composition is shown to be partiallycomparable in both approaches.Many studies in the field of social support (e.g., Hobfoll, 1985; Thoits, 1985; Cutrona andRussell, 1990; Kienan, 1997) show that the effectiveness of a certain type of provided sup-port and mechanisms, by which the support functions, are often highly dependent on thespecific situation, where support is needed. For instance, emotional support is provided ina situation (e.g., an accident), where the affected person needs or expects help of a morepractical kind. The provided unsuitable type of support may thus cause additional stress,dissatisfaction, feelings of being misunderstood, controlled or alienated. Therefore, thecontext of a specific situation conditions, how effective a certain type of support can be.In this paper we compare and analyze composition of the reported social support networkassessed by the Antonuccis hierarchical approach and by the role relation approach basedon enacted support in 15 major life changes, which happened during the last three years.Composition of social support network (overall and partial across four types of social sup-port) is compared to the overall composition of enacted support.

36 Applied Statistics 2008

Page 37: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Statistical Applications - Biostatistics II Sunday, September 21

Statistical Applications - Biostatistics IIThe Unemployment Structure of Turkey: Survival Models with Nonproportional Haz-ardsErengul Ozkok (Department of Actuarial Sciences, Hacettepe University, Beytepe, Ankara,Turkey; [email protected])Nihal Ata (Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey;[email protected])Ugur Karabey (Department of Actuarial Sciences, Hacettepe University, Beytepe, Ankara,Turkey; [email protected])

Unemployment problem in developing countries has received great attention. Like in manyother countries, unemployment is one of the biggest problem of Turkey because of its neg-ative socioeconomic effects. To investigate the effects on unemployment duration, survivalmodels can be used. Duration data analyses have traditionally been based on Cox pro-portional hazards model which is widely used for the analysis of treatment and prognosticeffects with censored survival data, which makes the assumption of constant hazards ratio.In the violation of this assumption, to investigate the effects of covariates on survival time,different survival models with nonproportional hazards should be used.In this study, the factors that affect the unemployment duration in Turkey after 2001 eco-nomic crises are tried to be determined with survival models under nonproportional haz-ards. Panel data used in this study covers the first two quarters of 2003 Turkish HouseholdLabour Force Survey (HLFS) results which is conducted quarterly by Turkish StatisticalInstitute (TURKSTAT).

ReferencesAddison, J.T. and Portugal, P. (2003), Unemployment Duration: Competing and DefectiveRisks, Journal of Human Resources, Vol. 38, pp.156-191.Collett, D. (1994), Modelling Survival Data in Medical Research, Chapman & Hall, Lon-don.Cox, D.R. (1972), Regression models and life-tables, Journal of the Royal Statistical Soci-ety Series B, Vol. 34, pp. 187-220.Foley, M.C. (1997), Determinants of Unemployment Duration in Russia, Economic GrowthCenter: Center Discussion Paper no. 779, New Haven: Yale University.Grogan, L. and van den Berg, G.J. (2001), The Duration of Unemployment in Russia, Jour-nal of Population Economics, Vol. 14, pp. 549-568.Klein, J. P. and Moeschberger, M.L. (1997), Survival Analysis Techniques for Censoredand Truncated Data, Springer, New York.Kupets, O. (2006), Determinants of Unemployment Duration in Ukraine, Journal of Com-parative Economics, Vol. 34, pp. 228-247.Lubyova, M. and van Ours J. C. (1997), Work Incentives and the Probability of LeavingUnemployment in the Slovak Republic, Tinbergen Institute Discussion Papers 97-071/3,

Applied Statistics 2008 37

Page 38: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Statistical Applications - Biostatistics II

Tinbergen Institute.Schemper, M. (1992), Cox analysis of survival data with nonproportional hazards func-tions, The Statistician, Vol. 41, pp. 455-465.State Institute of Statistics (SIS) (2001), Household Labour Force Survey: Concepts andMethods, Ankara, SIS, Publication No.2484.Tansel, A., Tasci, H.M. (2004), Determinants of Unemployment Duration for Men andWomen in Turkey, ERC Working Paper in Economic 04/04.Tunali, I. and Asaad, R. (1992), Market Structure and Spells of Employment and. Unem-ployment: Evidence from the Construction Sector in Egypt, Journal of Applied Economet-rics, Vol. 7, pp. 339-367.Van den Berg, G.J. and Van Ours J.C. (1999), Duration dependence and heterogeneity inFrench youth unemployment durations, Journal of Population Economics, Vol. 12, pp.273-28.

The Comparison of Partial Least Squares Regression and Principal Component Re-gression with Multiple Linear Regression on An Air Pollution DataEsra Polat and Suleyman Gunay (Department of Statistics, Faculty of Science, The Univer-sity of Hacettepe, Beytepe-Ankara, Turkey; [email protected],[email protected])

In Partial Least Squares Regression (PLSR) the goal is to predict one or more dependentvariables from a set of independent variables. A basic problem in Multiple Linear Regres-sion (MLR) is multicollinearity which causes the overestimation of the regression parame-ters and increase of the variance of these parameters. PLSR is a biased regression methodas Principal Component Regression (PCR), which is used to overcome multicollinearityproblem.MLR searches to find a single factor that best correlates the independent variables withdependent variables. PCR finds factors which capture the largest amount of variance inthe independent variables. However, PLSR attempts to find factors which do both capturevariance and achieve correlation. Consequently, PLSR can be considered as a techniquethat generalizes and combines features from PCR and MLR.Air pollution shows itself as a serious problem in big cities in Turkey, especially for winterseasons. It is obvious that modelling and prediction of air pollution have great importancein preventing the occurrence of air pollution episodes. In addition, it provides sufficienttime to take the necessary precautions. In this study, Ankara city is taken as the study areaand December 2006 is considered as the period. MLR, PCR and PLSR are used in mod-elling and predicting air pollution on the basis of various meteorological parameters. Themeteorological parameters used in this study are daily average press, daily average solarradiation, daily average cloudness, daily average humidity, daily average wind speed, dailyminimum temperature, daily maximum temperature, daily average temperature and daily

38 Applied Statistics 2008

Page 39: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Statistical Applications - Biostatistics II Sunday, September 21

total rainfall. The results show that when the model fit is considerable, MLR seems to fitthe data best. However, PCR and PLSR yield somewhat better results in terms of the pre-diction when compared to the MLR. The data set used in this paper has also proved thatthe regression models constructed by PLSR and PCR have higher predictive ability withsmaller number of factors than MLR. This is advantageous in that there are fewer factorsto interpret.

ReferencesWise, B.M., Gallagher, N.B., Bro, R., Shaver, J. M., Windig, W., Koch, J.S. (2006). PLSToolbox 4.0 for use with Matlab, 3905 West Eaglerock Drive, Wenatchee, WA, EigenvectorResearch Inc.Naes, T., Isaksson, T., Fearn, T. and Davies, T. (2002). A User-Friendly Guide to Multi-variate Calibration and Classification. NIR Publications Chichester, UK, 344 p.Wold, S., Sjostrom, M., Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics.Chemometrics and Intelligent Laboratory Systems, 58, 109-130.

Reliability Properties Related to Friday and Patil’s Bivariate Exponential ModelJuana-Marıa Vivo (Department of Quantitative Methods for Economy, University of Mur-cia, Spain; [email protected])Manuel Franco (Department of Statistics and Operations Research, University of Murcia,Spain; [email protected])

The lifetime of one system is determined by its components and structure. Most of thesystems formed in the real life, are series and parallel systems, which are determined by theworking of all or at least one of its components, respectively. In general, the lifetimes of thecomponents are dependents, i.e. the aging of a component affects the rest. For example,paired organ lifetimes in biological systems or two motors of a plane in reliability field,where a component is overloaded when the other fails. In these situations, it is important toconsider different bivariate distributions and know their properties for modelling lifetimedata.Friday and Patil’s (1977) bivariate exponential (FPBVE) model is one of the most flexiblebivariate exponential distributions in the literature, among others it contains the bivari-ate exponential models due to Freund, Marshall-Olkin, Block-Basu and Proschan-Sulloas particular cases. This model was derived from three physical scenarios with compo-nents exposed to shocks, and recently it has attracted application in hydrological sciences(Nadarajah and Gupta, 2006).Notice that the extreme statistics from some bivariate exponential models are predomi-nantly generalized mixtures of exponentials, i.e. exponential mixtures allowing negativeweights (Baggs and Nagaraja, 1996). Several authors have studied reliability propertiesrelated to the extreme statistics from some of these, such as the log-concavity of their sur-

Applied Statistics 2008 39

Page 40: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Sunday, September 21 Statistical Applications - Biostatistics II

vival or density functions. For instance, Baggs and Nagaraja (1996) and Franco and Vivo(2002) discuss the reliability properties for generalized mixtures of three or fewer exponen-tial components, and Franco and Vivo (2006) establish the case of four exponentials.Nevertheless, the marginals of FPBVE model are either generalized mixtures of exponen-tials or mixtures of gamma and exponential components. Thus, its maximum statistic iseither a generalized mixture of exponentials or a generalized mixture of gamma and ex-ponentials. Recently, Franco and Vivo (2007) analyze the log-concavity of the survivalfunction of the maximum statistic from FPBVE model.In this work, we present parametric restrictions for generalized mixtures of gamma and oneor two exponential components that yield legitimate probability models, and then the clas-sification of the maximum statistic from FPBVE model in according to the log-concavityof its density function, i.e. in the increasing or decreasing likelihood ratio classes (ILR orDLR).

ReferencesBaggs, G.E., Nagaraja, H.N. (1996). Reliability properties of order statistics from bivariateexponential distributions. Commun. Statist. Stochastic Models, 12, 611–631.Franco, M., Vivo, J.M. (2002). Reliability properties of series and parallel systems frombivariate exponential models. Commun. Statist. Theory Methods 31, 2349–2360.Franco, M., Vivo, J.M. (2006). On log-concavity of the extremes from Gumbel bivariateexponential distributions. Statistics 40, 415–433.Franco, M., Vivo, J.M. (2007). Generalized mixtures of gamma and exponentials andreliability properties of the maximum from Friday and Patil Bivariate Exponential Model.Commun. Statist. Theory Methods 36, 2011–2025.Friday, D.S., Patil, G.P. (1977). A bivariate exponential model with applications to relia-bility and computer generation of random variables. In: Tsokos, C.P., Shimi, I. ed., Theoryand Applications of Reliability. Vol. 1. New York: Academic Press, pp. 527–549.Nadarajah, S., Gupta, A.K. (2006). Friday and Patil’s bivariate exponential distributionwith application to drought data. Water Res. Manag. 20, 749–759.

40 Applied Statistics 2008

Page 41: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Invited Lecture Monday, September 22

Invited LectureDynamic Models for Survival and Event History DataØrnulf Borgan (Department of Mathematics, University of Oslo, Norway;[email protected])

Counting processes provide a natural framework for dynamic modelling of survival andevent history data, and for studying the properties of statistical method for analysing suchdata. In the talk I will start by describing how survival and event history data may con-veniently be described by counting process models (e.g. Aalen et al., 2008). I will thenconcentrate on situations where information on fixed as well as internal time-dependent co-variates is available for the subjects under study. For such situations, a regression analysisincluding all covariates will give insight on the importance of the internal time-dependentcovariates, but may underestimate the effect the fixed covariates, including treatment. Onthe other hand, an analysis without the internal time-dependent covariates will give a correctestimate of the effect of the fixed covariates, but it will offer no information on the effectsof the internal time-dependent covariates. By using a generalization of classical path anal-ysis, denoted dynamic path analysis, one may reconcile the two approaches (Fosen et al.2006; Aalen et al., 2008). In particular, dynamic path analysis makes it possible to obtain adetailed picture on how treatment and other fixed covariates partly have a direct effect andpartly have an indirect effect mediated through the internal time-dependent covariates.

ReferencesAalen, O.O., Borgan, Ø. and Gjessing, H.K. (2008). Survival and Event History Analysis:A Process Point of View. Springer-Verlag, New York.Fosen, J., Ferkingstad, E., Borgan, Ø., and Aalen, O.O. (2006). Dynamic path analysis – anew approach to analyzing time-dependent covariates. Lifetime Data Analysis 12, 143-167.

Applied Statistics 2008 41

Page 42: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Monday, September 22 Biostatistics and Bioinformatics I

Biostatistics and Bioinformatics IEvaluation of Reduced Rank Semiparametric Models to Assess Excess of Risk in Clus-ter AnalysisMarco Geraci (University of Manchester, Manchester, UK; [email protected])Andrew B. Lawson (University of South Carolina, Columbia, SC, USA;[email protected])

The spatial variability of the risk of a certain disease may be related to one or many en-vironmental hazards. From a statistical point of view, the modelling and the detection ofdisease clusters potentially related to those hazards offer challenging tasks.In this study, we consider a semiparametric approach to focused disease clustering forsmall area health data when a pre-specified putative source of health hazard and a back-ground spatial correlation concur to determine the intensity of a Poisson distribution. Themodelling framework is that of a low rank mixed model representation of the thin platespline.We investigate some issues related to the identification of the random effects arising fromthis approach. Under different simulated scenarios, we assess the proposed models usingconditional Akaike’s weights and tests for variance components, providing a comprehen-sive model selection methodology easy to implement. We propose the significance and thenumber of distinct variance components as a rule of thumb in choosing between modelswith similar weight and different random structures when using the conditional Akaike In-formation Criterion.The motivating example is offered by data consisting of lung cancer deaths observed inOhio state counties between 1987 and 1988. These data were analyzed on several occasionsto investigate the risk associated with a nuclear installation located in Hamilton county. Inour analysis, we found a strong south-eastward spatial trend which is confounded with asignificant radial distance effect decreasing between 0 and 150 kilometres from the pointsource.

Weighted Estimation in Cox RegressionSamo Wakounig, Georg Heinze and Michael Schemper (Section of Clinical Biometrics,Core Unit for Medical Statistics and Informatics, Medical University of Vienna, Vienna,Austria; [email protected])

The weighted Cox regression (WCR) model has been proposed as an alternative to thestandard Cox model (Schemper, 1992; Sasieni, 1993) particularly for the situation of non-proportional hazards. While there are several methods to appropriately cope with non-proportional hazards, in particular by the inclusion of parameters for time-dependent effectsterms, weighted estimation in Cox regression is a parsimonious alternative. The parameters

42 Applied Statistics 2008

Page 43: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Biostatistics and Bioinformatics I Monday, September 22

of the WCR can be interpreted as average hazard ratios also under nonproportional hazards.Therefore we will discuss the concept of an average hazard ratio (cf. Kalbfleisch and Pren-tice, 1981), its connection to effect size P (X < Y ) and discuss its most elementary defini-tion by the odds OC = P (T0 < T1)/P (T0 > T1), where T0 and T1 denote two survivalgroups. OC is valid independently of proportional hazards.Estimating these quantities by a Cox type regression, the contributions to the partial like-lihood need to be weighted by S(t)G(t)−1. The overall survival function S(t) reflects therelative importance attached to the contributions in the estimating equations at differenttimes. Using these weights extends the test of Prentice to multiple covariates, as does stan-dard Cox model to Mantels test. The estimator of the censoring distribution, G(t), is appliedto compensate the attenuation in observed events due to censoring. Replacing S(t) by thenumber of individuals at risk would lead to the generalization of the Breslow test. Infer-ence is based on the Wald method, where the sandwich covariance matrix of (Lin, 1991)is compared to an adapted version of Lin and Weis (1989) robust covariance matrix and ajackknife variance.The WCR model is exemplified on a lung cancer data set and compared to standard Coxanalysis. A Monte Carlo study provides results on efficiency and bias. Application of theWCR model is facilitated by an R package coxphw and a SAS macro WCM available atwww.muw.ac.at/msi/biometrie/programs.The project is supported by grant P18553-N13 of the Austrian Science Fund.

ReferencesKalbfleisch, J.D. and Prentice, R.L. (1981): Estimation of the average hazard ratio, Biometrika,68, 1, 105-112.Lin, D.Y. (1991): Goodness-of-fit analysis for the Cox regression model based on a classof parameter estimators. Journal of the American Statistical Association, 86, 415, 725-728.Lin, D.Y. and Wei, L.J. (1989): The robust inference for the Cox proportional hazardsmodel. Journal of the American Statistical Association, 84, 408, 1074-1078.Sasieni, P. (1993): Maximum weighted partial likelihood estimators for the Cox model.Journal of the American Statistical Association, 88, 421, 144-152.Schemper, M. (1992): Cox analysis of survival data with non-proportional hazard func-tions. The Statistician, 41, 455-465.

Applied Statistics 2008 43

Page 44: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Monday, September 22 Biostatistics and Bioinformatics I

Goodness-of-fit of Semiparametric Additive Regression Models in Relative SurvivalGiuliana Cortese (Department of Statistical Sciences, University of Padova, Padova, Italy;[email protected])Thomas H. Scheike (Department of Biostatistics, University of Copenhagen, Denmark;[email protected])

Additive regression models for relative survival under the semiparametric approach arediscussed and a method for checking the model assumptions is presented. In order to showthese aspects, we analyzed data from the TRACE study, where the aim was to assess theeffect of risk factors on the excess mortality of patients with acute myocardial infarction.There is often interest in checking whether the sub-model for the excess hazard is ade-quate. Lack of fit might be due to mispecification of the functional form of covariates, to awrong form of the link function of the excess hazard, or to non-proportinality of the excesshazard when it is modelled in multiplicative form. We demonstrate that cumulative sumsof martingale-based residuals, over follow-up time or covariate values, are a useful instru-ment for investigating and detecting these critical problems in additive models for relativesurvival. Using cumulative martingale residuals, graphical plots and test of hypothesescan be performed by a resampling approach. We compare this new approach with recentgoodness-of-fit methods in literature.We show how the goodness-of-fit analysis based on cumulative martingale residuals re-veales a very poor fit of the proportional excess hazards model (with excess hazards inmultiplicative form) to the TRACE data. The serious lack of fit of this model is due toviolating the proportionality assumption of the excess hazards, which is strictly related toregression coefficients being time-constant in this model. Therefore, we illustrate the ad-vantages of an alternative semiparametric additive model, where the excess hazards are inadditive form, and some covariate effects are allowed to be time-varying while the remain-ing regression coefficients are constant with time. Finally we provide examples of howcumulative martingale residuals can also be used to check the goodness-of-fit of this lattersemiparametric model.

A Simulation Study on Power Comparisons for Group Sequential Tests of Non-ParametricStatistics in Case of Non-Proportional HazardsYaprak Parlak Demirhan, Haydar Demirhan and Sevil Bacanli (Hacettepe University, Beytepe,Ankara, Turkey; [email protected], [email protected], [email protected])

In clinical trials, patients enter the study in sequence, and they are allocated to the treatmentarms randomly. In these trials, one may want to compare the time to failure data thatcomes from different treatments. Non-parametric statistics are frequently used to analyzethe accumulated data coming from different treatments, in the survival analysis. Analyzingthe accumulated data in groups is the most convenient way to perform this comparison.

44 Applied Statistics 2008

Page 45: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Biostatistics and Bioinformatics I Monday, September 22

Because of ethical and economical reasons, the group sequential test procedures of non-parametric statistics are used to make hypothesis tests of these data. Group sequential testsprovide the advantage of early stopping of the trial.Aim of this work is to make power comparisons of non-parametric statistics, which areused for analysis of survival data, in the group sequential setting when the proportionalhazards assumption is violated. It should be noted that censored observations are generallyseen in survival data. Therefore, if power calculations are made irrespective of censoring,reliable results might not be achieved, due to the lack of information about the censoringstructure. A wide simulation study, covering different non-proportional hazards scenarios,different censoring rates and tied observations, is conducted to make the power compar-isons of group sequential test of non-parametric statistics for survival data. Considerednon-parametric statistics are log-rank, Tarone-Ware, Gehan-Wilcoxon, Peto-Peto, asymp-totically weighted log-rank and Fleming-Harrington family.

ReferencesPocock, S.J. (1977). Group sequential methods in the design and analysis of clinical trials.Biometrika, 64, 2, 191-199.O’Brien, P.C., Fleming, T.R. (1979). A multiple testing procedure for clinical trials. Bio-metrics, 35, 549-556.Lee, J.W. (1996). Some versatile tests based on the simultaneous use of weighted log-rankstatistics. Biometrics, 52, 721-725.Slud, E.V., and Wei, L.J., 1982, Two-sample repeated significance tests based on the mod-ified Wilcoxon statistics. Journal of The American Statistical Association, 77, 862-868.Peto, R., and Peto, J. (1972). Asymptotically efficient rank invariant test procedures (withdiscussion). Journal of Royal Statistical Society, A., 135, 185–206.Gehan, E. (1965). A generalized Wilcoxon test for comparing arbitrary single censoredsamples. Biometrika, 52, 203-223.Tarone, R.E., and Ware, J. (1977) On distribution free tests of the equality of survival dis-tributions. Biometrika, 64, 156-160.Moreau, T., Maccario, J., Lellouch, J., Huber, C. (1992). Weighted log rank statistics forcomparing two distributions. Biometrika, 79, 195–198.Jennison, C., and Turnbull, B. W. (1999). Group sequential methods with Applications toclinical trials, Chapman&Hall, London.

Applied Statistics 2008 45

Page 46: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Monday, September 22 Statistical Applications - Social Sciences

Statistical Applications - Social SciencesThe Changes of the Working Realization of the Graduates from Bari University Ac-cording to TimeFrancesco Campobasso and Annarita Fanizzi (Dipartimento di Scienze Statistiche ”CarloCecchi”, University of Bari, Bari, Italy; [email protected], [email protected])

The aim of this paper is to analyze how the working realization, reached by the graduates in2002 Summer session from Bari University, changes according to time. At first, in order toestimate such a realization by means of a fuzzy algorithm, we matched the importance at-tributed (before getting the degree) to each of eight aspects of the searched job (possibilityof acquiring professional skills, of gaining advancement, of developing cultural interestsand of promoting perspectives of larger earnings, coherence with studies, economic stabil-ity, independence and free time) with the corresponding satisfaction perceived 1, 3 and 5years later.The graduates could allot a score varying from 1 to 5 to the importance of the differentaspects of job and a score varying from 1 to 10 to the corresponding perceived satisfaction.The aforesaid scores do not represent an objective measure of personal opinions, that aredistributed along an ideal continuum, but they rather correspond to accumulation values onthe considered scale.In particular the importance scores are classified in three different levels (low, medium andhigh), while the satisfaction scores, more numerous than the previous ones, are classifiedin five different levels (low, medium-low, medium, medium-high and high); in both casesthe membership of a generic score to different levels is represented by triangular functions,very suitable for natural numbers like the ones we manage. By applying logic rules such asIf (situation) then (action), we estimated the realization connected with each aspect of thejob, whose scores are classified in seven different levels (in order to obtain a more accurateestimation). At a parity of importance, the level of such a realization will not decrease thehigher the satisfaction is; moreover, if the satisfaction is low (high), the realization will notincrease (decrease) the higher the importance is.The final score of the estimated realization derives from a defuzzification procedure calledcentroid method, that is one of the most efficient. In particular the aforesaid score is ap-proximately allocated in the middle of the corresponding range space of the fuzzy set andit reflects the satisfaction perceived, combined with the importance attributed to the singleaspect of the job. At this point the overall working realization is estimated by the averageof the eight different scores, derived from the above described defuzzification procedures.Finally we compared the overall working realization of the interviewed graduates accord-ing to time; in particular we analysed the differences between the graduates in terms ofgender, marks at finals, length of university education, attended academic course, trainingexperiences and so on.

46 Applied Statistics 2008

Page 47: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Statistical Applications - Social Sciences Monday, September 22

Statistical Analysis of Marital Status of the Population of VojvodinaKatarina Cobanovic (Faculty of Agriculture, University of Novi Sad, Novi Sad, Republicof Serbia; [email protected])Valentina Sokolovska (Faculty of Philosophy, University of Novi Sad, Novi Sad, Republicof Serbia; [email protected])Slobodan Nicin (Faculty of Agriculture, University of Novi Sad, Novi Sad, Republic ofSerbia; [email protected])

Using parametric and nonparametric statistical methods, it is analized the marital status ofthe population of Vojvodina. The analysis is based on the population census results for1991. and 2002. The period between the two last population censuses (1991-2002), ischaracterised by very dynamic social changes in Serbia (Central Serbia and Vojvodina).These changes have had very important influences on the changes of population maritalstatus. The analysis is focused to the basic indicators of marital structure: the average ageof the population when forming or dissolving unions, the coefficient of divorciality by sexand by the age groups etc. The attention in the analysis would be oriented to the researchof simillarities or differences in marital status of population of Vojvodina according to thesetllement character: urban or rural. The analysis of the marital status of the population ofVojvodina would be done too by the professional occupation and by fields of activities.

The Elliptical Model of Multicollinearity and the Petres Red IndicatorPeter Kovacs and Tibor Petres (Department of Statistics and Demography, University ofSzeged, Szeged, Hungary; [email protected])

A possible method for modelling multicollinearity is to examine the orthogonality of ex-planatory variables, that is the stretching of the space of explanatory variables. The questionrightly arises whether multicollinearity can be modelled in a different way.As a new approach, The elliptical model of multicollinearity can be formulated on the ba-sis of Petres Red indicator. Parallel with the increase in the extent of the mean covarianceof the variables, the possible eigenvalues are situated on an m-dimensional sphere with agreater radius. The possible eigenvalues are situated on a segment of the m-dimensionalsphere in such a way that with a fixed Red value they are located on an (m1)-dimensionalellipsoid.Unfortunately, the higher the dimension number of the model is, the more conditions haveto be given for determining and studying the range of possible eigenvalues. Therefore thedetailed examination of this range and of the elliptical curves was carried out only for threeexplanatory variables. We compared how the ellipses and the lines containing the identical-value quotients of the highest and lowest values of the eigenvalues move along the range ofthe possible eigenvalues.

Applied Statistics 2008 47

Page 48: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Monday, September 22 Biostatistics and Bioinformatics II

Biostatistics and Bioinformatics IIPower of Resequencing Studies: A Group-Wise Association Test for Rare Disease Sus-ceptibility MutationsBo Eskerod Madsen (Bioinformatics Research Center (BiRC), University of Aarhus, Aarhus,Denmark; [email protected])Sharon Browning (Department of Statistics, The University of Auckland, Auckland, NewZealand)

Resequencing parts of the genome is an emerging possibility that enable identification ofrare disease associated mutations. As rare mutations are difficult to tag with SNP genotyp-ing, resequencing is able to identify novel types of scenarios for genetic disease association.Human resequencing studies have shown that genetic heterogeneity is a probable scenario,where multiple rare mutations altogether explains a large part of the affected individuals.Along this line, we propose a method to analyse a group of rare disease susceptibility mu-tations, e.g. from resequencing a gene. We compare the proposed methods to alternativemethods, and show that it is powerful to identify disease associated genes, both on simu-lated and real data. Using the proposed method, a resequencing study can identify a diseaseassociated gene with an overall PAR of 2% (Odds Ratio: 1.2), even when each individualmutation has much lower PAR; using 1000 to 7000 affected and unaffected individuals,depending on the underlying genetic mode. This study thus supports that resequencingstudies may be able to identify important genetic associations, if specialised analysis meth-ods are used.

Multinomial Classification Models in BiostatisticsVoicu Boscaiu (Institute of Mathematical Statistics and Applied Mathematics, Bucharest,Romania; [email protected])Daniela Bratosin and Manuela Sidoroff (National Institute of Research and Developmentfor Biological Sciences, Bucharest, Romania)

There are many situations when the common sense wrongly suggests us to reason in termof black/white (namely yes/no, living/dead, good/bad or 0/1). An explanation is that weare not able to answer the question ”what could be between black and white?”. It could bea mistake to think that it is nothing there. But, sometime it is also a mistake to believe that”between black and white there are different degrees of grey”. To exemplify, we intend todiscuss two statistical models.The first model is the death of cells under pollutant influence. There are two ways in whichcells die: necrosis (they are killed by injurious agents with inflammation of surroundingtissues) and apoptosis (a kind of induced suicide, without inflammation). Hence we referto a three-class structure.

48 Applied Statistics 2008

Page 49: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Biostatistics and Bioinformatics II Monday, September 22

The second decision model is the so-called ”surgical prognostic” of pre-surgical colorec-tal peritonitis. ”Surgical prognostic” is based on: general patient data (age, sex, simplelaboratory tests, etc), pre-surgical information, surgical diagnosis information and surgicalprotocol descriptors, excepting any post-surgical information. We identified a four-classgood discriminating structure (C1= recovery without re-intervention, C2= recovery afterre-interventions, C3= death after re-interventions and C4= death without re-intervention).We focus our interest mainly on the statistical manipulation of data. The conclusion soundssomehow strange: an appropriate definition of a multinomial structure could improve thetwo-class discrimination.Study developed under programs PNCDI2 91-052 / 2007.

Subgroup Discovery in Data Sets with Two Sets of Variables Using a Combination ofClustering and Classification TechniquesLan Umek (Faculty of Computer and Information Sciences, University of Ljubljana, Ljubl-jana, Slovenia; [email protected])Uros Petrovic (Faculty of Computer and Information Sciences, University of Ljubljana,Ljubljana, Slovenia; [email protected])Blaz Zupan (Faculty of Computer and Information Sciences, University of Ljubljana, Ljubl-jana, Slovenia; [email protected])

The discovery of interesting subgroups has a high practical relevance in many domainsof science. In the talk we will present a method for finding interesting subgroups in datasets which consist of several predictor and several outcome variables. Recently availabil-ity of data set of this kind particularly in the areas such as biomedicine, functional andchemogenomics, and bioinformatics and emergence of problems which require their anal-ysis motivate the development of methods and tools to address such problems. We proposean approach that tries to find the subsets of instances where in each subset the instances aresimilar in their values of outcome variables and where the set membership can be reliablydescribed using the predictor variables. The approach we propose starts with a user-definedoutcome variables-based similarity measure. To identify various candidates for instancesubsets, we use hierarchical clustering and then consider all possible instance subsets bytraversing the resulting hierarchical structure (dendrogram). Only subgroups of sufficientsize are considered for subsequent analysis. The algorithm then tries to assess the degreeof separability of each cluster from its complement through using a selected superviseddata mining approach. For each cluster, the quality of separation from its complement isassessed by means of area under ROC curve (AUC) obtained using a leave-one-out eval-uation method. Groups with AUC higher than some predefined threshold are reported tothe user. Additionally, we also assess the significance of obtaining such AUC for a giveninstance set size using a permutation test. The method has been tested on several domainsfrom functional and chemical genomics.

Applied Statistics 2008 49

Page 50: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Monday, September 22 Modeling and Simulation

Modeling and SimulationIntroduction to GNSS-SIM AlgorithmsJana Heckenbergerova (Department of Information Technologies, Faculty of Electrical En-gineering and Informatics, University Pardubice, Pardubice, Czech Republic;[email protected])Hana Bohacova (Institute of Mathematics, Faculty of Economics and Administration, Uni-versity Pardubice, Pardubice, Czech Republic)

Currently there exist several methods of train position determination, safety verification oftrain position and view of driven trace on given time interval. Research in The Global Nav-igation Satellite Systems (GNSS) area shows, that it is advantageous to use satellite naviga-tion for these purposes. Some locomotives owned by Czech Railway a.s. are equipped byGNSS receiver. All of these locomotives send their GNSS positions every 5sec to centralcomputer. All GNSS positions are saved to GNSS database. By the help of database itcould by possible to describe (analytically or discretely) all used tracks, view driven traceof some train, safety verify current position of given locomotive.Aim of future research is analysis, cleaning and filtration data from GNSS database. It leadsto discrete and analytical description of train tracks. Knowledge of train track descriptionis necessary for safety verification of train position. GNSS-PIM (GNSS Position IntegrityMonitoring) algorithms for given train track description are used for this verification andwere presented in (Dvorakova et al., Heckenbergerova, 2007, Heckenbergerova, 2008).Aim of presented paper is description of GNSS SIM (GNSS-simplification) algorithms,which simplify GNSS database. Main target of algorithms is finding data, which lies onsame linear train track, and analytic (or parametrical) description of this track. Solving ofthis problem is based on linear statistical model with constrains and testing linear hypothe-ses by the tests of fit.

ReferencesDvorakova, J., Mocek, H., Maixner, V. (2005) Statistical Approach to the Train positionIntegrity Monitoring. 2nd International Conference ”Reliability, safety and diagnostics oftransport structures and means 2005”, Pardubice, ISBN 80-7194-769-5.Heckenbergerova, J. (2007) Parametrical algorithms for GNSS train position verification.International Conference Infotrans 2007, Pardubice, ISBN: 978 80 7194 989 3 (in Czech).Heckenbergerova, J. (2008): GNSS train position verification by the help of parametricalPIM algorithms. 2nd International Conference Aplimat 2008, Bratislava, ISBN: 978 8089313 03

50 Applied Statistics 2008

Page 51: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Modeling and Simulation Monday, September 22

Using Partial Standard Deviations in Partial Least Squares RegressionAylin Alin (Department of Statistics, Faculty of Arts and Sciences, Dokuz Eylul University,Izmir, Turkey; [email protected])

Partial least squares regression (PLSR) is a latent variable based multivariate techniquewhich is combined of partial least squares (PLS) analysis and multiple linear regression(MLR). When the measures of the explanatory variables are different, calculating standard-ized regression coefficients are the mostly used solution in MLR and PLSR. However, thereare some objections against these coefficients including their interpretations and calcula-tions in MLR. Bring (1994) suggested new standardized regression coefficients calculatedon the variables standardized using partial standard deviations. In the present study, thesenew regression coefficients are used for PLSR instead of ordinary standardized coefficientsto show if followings hold or not; 1) the standard errors of these coefficients are smallerthan the standard errors of ordinary standardized coefficients, 2) the PLSR model built withthese coefficients has smaller mean square error and better predictive ability, 3) the latentvariables obtained from standardized explanatory variables which are calculated using par-tial standard deviations explain more variance in both explanatory and response variables.

ReferencesBring, J (1994). How to Standardize Regression Coefficients. The American Statistician,48, 209-213.

A Comparative Study on the Efficiency of Closed Multiple Testing Applied to OrdinalDataKamon Budsaba, Penkhae Siriwan and Tipaval Phatthanangkul (Department of Mathemat-ics and Statistics, Thammasat University, Rangsit Center Pathumthani, Thailand;[email protected])

In this study we aim to compare the efficiency of five closed multiple test method i.eHotelling’s ,Bonferroni - Holm, Westfall - Young bootstrap, Exact Permutations and Hom-mel’s Method Based on Simes’ Test method, by using step - wise procedure for testing thedifference between two groups of ranked data. We consider the capacities of controllingtype I error rate and their powers when population have multinomial 1 to 5 or 1 to 9 andhave symmetric or left skewed distribution under 3, 5 and 7 dependent variables, equalsample size of 10 and 30 and at .05 level of significance. Monte Carlo simulation was per-formed and repeated 500 times for each scenario. The results of this study are as follows:Possible values of multinomial variable and shape of the distribution affect the capacity ofcontrolling the type I error rate and their powers. In most situations, Hommel’s MethodBased on Simes’ Test method tends to be the best efficiency method. Almost every meth-ods except Hotelling’s T 2 tends to be indifferent efficiency under left skewed distribution,

Applied Statistics 2008 51

Page 52: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Monday, September 22 Modeling and Simulation

large sample size, and large number of dependent variables.

ReferencesWestfall, P.H., Tobias, R.D., Rom, D., Wolfinger, R.D., and Hochberg, Y. (1999), MultipleComparisons and Multiple Tests using the SAS System, SAS Institute Inc., Cary, NC.Westfall, P.H. and Wolfinger, R.D. (2000), ”Closed Multiple Testing Procedures and PROCMULTTEST”, Observations, SAS Institute Inc.

Application of Response Surface Methodology in the Optimization of Dust ProductPackagingElizabeth Dıaz-Castellanos, C. Dıaz-Ramos, L. C. Flores-Avila, S. Heyser-Fregoso and M.Arrioja-Rodrıguez, Instituto Tecnologico de Orizaba, Mexico; lyszy [email protected],[email protected], [email protected], [email protected],[email protected]

In the dust product packaging, the nominal weight is one of the most important require-ments of the client. Many factors like the speed of operation, density of the product exist,filling of the hoppers, calibration and position of glasses product dispensers among others;those that is known have a great impact in the nominal weight of the bags of the dust prod-uct.In this work an alternative for the selection of the levels of the packaging factors appearsthat optimize the amount of dust product by stock market. This was made later applying toexperimental design and the response surface methodology. The results show a complimentwith the specifications of the Mexican norms for packaging of this kind of product, for thiscase under study obtained a reduction of 90% in the losses of finished product. Similarly acomparison of software MINITAB 15 and DESIGN EXPERT 7.1.5 appears for the resolu-tion of the mentioned case.

52 Applied Statistics 2008

Page 53: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Invited Lecture Tuesday, September 23

Invited LectureVisualization Databases for Lossless Analysis of Complex Data SetsWilliam S. Cleveland (Departments of Statistics and Computer Science, Purdue, U.S.A.;[email protected])

Large, complex data sets are ubiquitous, the standard now rather than the exception. Theypresent challenging problems of analysis because of their size and the complexity of theirdata structures and patterns. One approach is to compute summary statistics at the outsetto reduce the complexity, but this expedient risks losing important information in the data.The goal should be lossless analysis: analyze the data at a level of detail and comprehen-siveness that does not sacrifice information.Achieving lossless analysis of complex data today is immensely challenging. New funda-mental approaches and methods are needed for each of the different areas that come intoplay in the analysis of the data — databases, data processing, data structures, statisticalmodels and methods, machine learning algorithms, data visualization, computational al-gorithms, software environments, and hardware environments. In fact, it has never beenharder to achieve lossless analysis because complexity has increased faster than our inno-vations in these areas.Nothing serves lossless analysis better than data visualization, the only practical way toabsorb large amounts of information in detail. But for today’s complex sets we must visu-alize far larger amounts than in the past. We must be ready to accept large displays eachcovering tens or even hundreds of screensful (pages). For a single data set it is reasonableto have hundreds of such displays. These displays become a new database produced fromthe data that is queried and studied. For a display of 500 pages, we might query and studyall or just a few of the pages depending on the task.Producing, querying, and studying a visualization database needs new ideas. There aredifferent modes of viewing the many pages and panels per page of a large display, fromslow focused study to very rapid scans. We need creative interfaces to facilitate the dif-ferent modes. We cannot afford to visually edit very large displays, interacting with themicro-elements to get them right, because there is too much; instead there should be smartautomation algorithms that get the large display right the first time. We must consider thephysical screen space, its size and resolution, to make it work most effectively for the visualstudy. We need methods of display that result in pre-attentive visual formation of gestaltsthat show instantaneously the relevant patterns in the data. This necessitates, strangely,more displays, starting with broad brush looks to derivative displays whose redesigns showspecific aspects of the broad brush more effectively. It also requires the study of visualperception.

Applied Statistics 2008 53

Page 54: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Tuesday, September 23 Mathematical Statistics

Mathematical StatisticsLong Term Behaviour of Imprecise Birth-Death ProcessesRichard Crossman, Pauline Coolen-Schrijner, Frank Coolen (Durham University, Durham,United Kingdom; [email protected], [email protected])Damjan Skulj (University of Ljubljana, Ljubljana, Slovenia; [email protected])

In recent years work has been generated by several authors on the study of discrete-timeMarkov chains for which the transition matrix is not entirely known. However, in the caseof a birth-death process, for a finite state-space and with one absorbing state, irrespectiveof imprecision eventual absorption remains certain. In the precise case, we can considerlong term behaviour of such a Markov chain conditioned upon non-absorption, which leadsus to the concept of the quasi-stationary distribution. In this talk we consider how suchconditioning can be applied in the imprecise case, and thus how we might generalise theidea of the quasi-stationary distribution to birth-death processes with imprecision.

Generalized Chinese Restaurant Construction of Exchangeable Gibbs Partitions andRelated ResultsAnnalisa Cerquetti (Bocconi University, Milano, Italy; [email protected])

In two recent papers (Lijoi et al. 2007, 2008) a Bayesian prior to posterior analysis for thesubclass of exchangeable partitions in Gibbs form of type α ∈ (0, 1), first introduced inPitman (2003), and largely studied in Gnedin and Pitman (2006), has been proposed fora nonparametric treatment of the typical species sampling inferential problem, in which arandom sample is drawn from an hypothetical infinite population of individuals of variousspecies, whose total number is unknown, and interest lies in conditional inference aboutfuture partitions induced by future samples.Here we present a generalized group sequential construction of exchangeable partitions inGibbs form of type α to place this theory in its natural probabilistic framework and to pro-vide new insights on the derivation of relevant results. Our construction, which relies onknown results of Pitman (2003, 2006) and Gnedin and Pitman (2006), has potential appli-cations for investigating additional distributional results for quantities of statistical interestwhen exploiting in a Bayesian nonparametric perspective the theory of exchangeable par-titions.

ReferencesCerquetti, A. (2008) Generalized Chinese restaurant construction of exchangeable Gibbspartitions and related results. arXiv:0805.3853v1 [math.PR]Gnedin, A. and Pitman, J. (2006) Exchangeable Gibbs partitions and Stirling triangles.Journal of Mathematical Sciences, 138, 3, 5674–5685.Lijoi, A., Mena, R. and Pruenster, I. (2007) Bayesian nonparametric estimation of the prob-

54 Applied Statistics 2008

Page 55: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Mathematical Statistics Tuesday, September 23

ability of discovering new species. Biometrika, 94, 769–786.Lijoi, A., Prunster, I. and Walker, S.G. (2008) Bayesian nonparametric estimator derivedfrom conditional Gibbs structures. Annals of Applied Probability, (To appear)Pitman, J. (2003) Poisson-Kingman partitions. In D.R. Goldstein, editor, Science andStatistics: A Festschrift for Terry Speed, volume 40 of Lecture Notes-Monograph Series,pages 1–34. Institute of Mathematical Statistics, Hayward, California.Pitman, J. (2006) Combinatorial Stochastic Processes. Ecole d’Ete de Probabilite de Saint-Flour XXXII - 2002. Lecture Notes in Mathematics N. 1875, Springer.

The Probability Functions for the Neyman Type Processes and Thomas Process: anApplication on Traffic AccidentsGamze Ozel (Department of Statistics, Hacettepe University, Ankara, Turkey;[email protected])

The Neyman type processes and Thomas process have been used for describing clustereddata since the Poisson process is insufficient for clustering of events. These processes arewidely used in accident theory, ecology, neurophysiology, radiobiology, quality control,telecommunications, etc. Although many studies have already been done using the Neymantype processes and Thomas process, the exact probabilities for the Neyman type processesand Thomas process have not been obtained. In this study, the explicit probability functionsare derived for the Neyman type processes and Thomas process. Besides, the probabilitydensity functions of the first exit times with positive jumps and time-independent bound-aries are obtained for these processes. An application based on accident data and fatalitiesin The Netherlands is given.

ReferencesBijleveld, F.D., (2005), The Covariance Between the Number of Accidents and the Num-ber of Victims in Multivariate Analysis of Accidents Related Outcomes, Accident Analysisand Prevention, 37, 591-600.Meintanis, S.G., (2007), A New Goodness-of-fit Test for Certain Bivariate DistributionsApplicable to Traffic Accidents, Statistical Methodology, 4, 22-34.Ozel, G., Inal, C., (2008), The Probability Function of the Compound Poisson Process andan Application to Aftershock Sequence in Turkey, Environmetrics, 19, 1, 79-85.Ozel, G., Inal, C., (2008), The Probability Function of the Compound Poisson DistributionUsing Integer Partitions and Ferrer’s Graph, Bulletin of Statistics and Economics, 2, 8, 70-81.Zacks, S. (2007), First Exit Times for Ordinary and Compound Poisson Processes withNon-Linear Boundaries, Methodology and Computing in Applied Probability, 9, 359-375.

Applied Statistics 2008 55

Page 56: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Tuesday, September 23 Mathematical Statistics

Application of Katz Family of Distributions for Detecting and Testing Overdispersionin Poisson ModelsMohammad Ali Baradaran Ghahfarokhi (Ministry of Labor and Social Affairs – StatisticsCentre Iran, Iran; [email protected])Hosseiyn Iravani (Esfahan Steel Mill Company, Iran; [email protected])

The Poisson regression model is often used as a first model for count data with covariates.However, the model requires equidispersion, which might not be valid for the data setunder consideration. In many distributions, variance has a specific function form, which iscalled nominal variance. Sometimes, in a random sample, the sample variance is greaterthan nominal variance, which is known as overdispersion. In this case, if we fit a regressionmodel to the data, the overdispersion will appear which affects the model. Sometimes countdata exhibit variation, referred to as overdispersion or underdispersion, resulting in the lackof fit of Poisson model. Score tests have been commonly used to detect overdispersion inthe data. In this paper, we provide a test for overdispersion in Poisson model using Katzfamily of distribution. Our setup has two extensions: First, Katz family of distribution isemployed as an extension of the Poisson distribution. Second, the mean- variance structureof the Poisson model is given by σ2 = µ + cµγ for arbitrary but fixed r. We drive a localscore test for testingH0 : c = 0. At the end, effects of the overdispersion will be consideredby using the road accident data in Iran that has a high death rate among countries in all overthe world.

56 Applied Statistics 2008

Page 57: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Design of Experiments Tuesday, September 23

Design of ExperimentsSome Notes About Efficiency Balanced Block Designs With Repeated BlocksBronisław Ceranka and Małgorzata Graczyk (Department of Mathematical and StatisticalMethods, University of Life Sciences in Poznan, Poznan, Poland; [email protected],[email protected])

We consider a class of block designs in which v treatments are arranged in b blocks accord-ing to the incidence matrix N = (nij), nij denotes the number of experimental units in thejth block getting the ith treatment, i = 1, 2, ..., v, j = 1, 2, ..., b. The ith treatment is repli-cated ri times and the size of jth block is kj . A block design is called efficiency balancedif all treatments contrasts are estimated with the same efficiency. For several reasons, inparticular from the practical point of view, it is desirable to have repeated blocks becausesome treatment combinations may be preferable then the others and also the design imple-mentation may cost differently according to the design structure admitting or not repeatedblocks. We give new construction methods of the efficiency balanced block designs basedon the incidence matrices of the balanced incomplete block designs.

Some Remarks About Optimum Chemical Balance Weighting Design for p = v + 1ObjectsBronisław Ceranka and Małgorzata Graczyk (Department of Mathematical and StatisticalMethods, University of Life Sciences in Poznan, Poznan, Poland; [email protected],[email protected])

The name: chemical balance weighing design is connected with the statistical theory ofexperiments and concerns on the determining of unknown measurements of objects in theGauss - Markoff model. Parallel to the standard Gauss - Markoff assumptions the addition-ally assumptions are connected with the errors. We assume that there are not systematicalerrors, they are uncorrelated and have different variances. In practice, the model respondsthe situation in which, for example, we determine unknown measurements of given numberof objects having the set of measurements installations with different precisions.In the literature there are considered some types of optimality criterions of the chemicalbalance weighing design. In the paper we consider so called optimum chemical balanceweighing design in which the variance of each estimator is the same and attains the lowerbound.We construct the optimum chemical balance weighing design for p = v and for p = v + 1objects and give the lower bound of variance for the designs. We give the existence condi-tions determining the optimum designs and we present some construction methods of theoptimum chemical balance weighing designs for p = v and p = v + 1 objects based onthe incidence matrices of the balanced incomplete block designs and the ternary balanced

Applied Statistics 2008 57

Page 58: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Tuesday, September 23 Design of Experiments

block designs for p = v objects. All results are new and they are not presented in theliterature.

A Comparison of Type I Error and Power of the Pairwise Comparisons Test underUnequal Small Difference Variance.Krongkaew Wangniwetkul (Department of Applied statistical, Faculty of Science, KingMongkut’s University of Technology North Bangkok, Bangkok, Thailand; [email protected])

This study compared the probability of Type I error and the power of the four pairwisecomparisons statistical tests : Tamhane’s T2 , Dunnett’s T3, Games-Howell and Dunnett’sC, using sample sizes of : 10, 15, 20, 25, 30, 35, 40, 45 and 50, and various differencesmeans of normal source populations of: 0% 10%, 30%, 50% and 100%, which also hadsmall variance differences, 0 < Φ < 1.5, where Φ is the value of the noncentrality pa-rameter (Game, Winkler and Robert (1972)). Monte Carlo methods were used to generateresponses based on the sample size and the difference of mean, 1,000 samples for eachcombination. Hypothesis testing in each case was conducted at 0.01 significant level. Inall cases the difference of samples variance was confirmed by Levene’s test. It was foundthat, by using the Binomial test, every test had a type I error less than the required value.Regarding the probability of type I error estimates, small samples had slightly higher valuesthan the large ones, Games-Howell’s test had considerable fluctuation in type I probabilityand Dunnett’s C’s test had the lowest probability. In the case of the power of the test, itwas less than 0.9 in all cases and varied with sample size. In the 10% difference of meanscases, Dunnett’s C’s test had the lowest power. Taking larger mean differences the powerof test increased. In all cases, the Games-Howell’s test had the highest power.

Application of Statistical Program R for Development of Sampling Schemes for En-suring Coexistence Measures in Maize FieldsKatja Rostohar (Crop and Seed Science Department, Agricultural Institute of Slovenia,Ljubljana, Slovenia; [email protected])Andrej Blejec (National Institute of Biology, Ljubljana, Slovenia; [email protected])Vladimir Meglic and Jelka Sustar Vozlic (Crop and Seed Science Department, AgriculturalInstitute of Slovenia, Ljubljana, Slovenia; [email protected], [email protected])

Pollen mediated gene flow is a natural biological process in which transfer of genetic in-formation among sexually compatible plants occurs by cross-pollination. In the case ofcoexistence between genetically modified (GM) plant varieties and non GM varieties it isimportant to trace GMOs already from the field on in order to comply with the EU regula-tory threshold for the adventitious presence of GMOs in food and feed. In the absence of

58 Applied Statistics 2008

Page 59: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Design of Experiments Tuesday, September 23

guidelines for sampling GMOs in the field, it is necessary to develop a reliable samplingprocedure for estimating adventitious presence of GMO maize before harvest in the fieldin the situation of co-existence. The aim of the study was to design appropriate samplingschemes to obtain reliable estimates of the outcrossing rate in the field in the situation ofcoexistence.The data for cross pollination were obtained from a two year field trial with yellow andwhite kernel maize varieties. In total 3600 sampling points were used in statistical analy-sis. The receptor field was divided into different zones with respect to the distance fromthe donor. Sampling in different zones was performed using simple random sampling withR statistical programme (R Development Core Team, 2007). Different number of sampleswas used for each zone; sampling was repeated 500 times each time. Sample means andtheir standard errors were analyzed. With the increased distance from the donor the vari-ance gets lower and stable thus the required number of samples needed to be taken to detectoutcrossing rate is diminishing with the distance from the donor.

Simulation Study to Check the Performance of Various Unequal Probability SamplingEstimatorsNadeem Shafique Butt (College of Statistical and Actuarial Sciences, University of thePunjab, Lahore, Pakistan; [email protected])Muhammad Qasier Shahbaz (Department of Mathematics, COMSAT Institute of Informa-tion Technology, Lahore, Pakistan; [email protected])

A large number of unequal probability sampling estimators are available in literature. The-oretical comparison of available unequal probability sampling estimators is a hard task andso various survey statisticians have performed empirical studies from time to time. Theseempirical comparisons have been done by using available populations from literature onsurvey sampling. The aim of this study is to perform the simulation study of various pop-ular unequal probability sampling estimators. This simulation study has been conductedby generating various populations with given correlation structure. The study attempt toobtain a minimum variance estimator in unequal probability sampling for population withspecific correlation structure.

Applied Statistics 2008 59

Page 60: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Tuesday, September 23 Econometrics

EconometricsInsurance Consumption in Italy: a Sub-Regional Panel Data AnalysisGiovanni Millo (Research Department, Assicurazioni Generali SpA, Trieste, Italy and De-partment of Economics and Statistics, University of Trieste, Trieste, Italy;Giovanni [email protected])Gaetano Carmeci (Department of Economics and Statistics, University of Trieste, Trieste,Italy; [email protected])

We analyze the demand for life and non-life insurance across 103 Italian provinces in 1992-2004. We assess the determinants of insurance consumption, in the light of the empiricalliterature and the distinctive features of our country, trying to explain the underdevelopmentof the South in the insurance sector.Among the benefits of using sub-regional data on insurance expenditure, one seems to usparticularly relevant. Since loadings on life insurance contracts tend to be uniform acrossregions of the same country, an important limitation of cross-country analyses, i.e. thedifficulty of observing prices in this market, may be alleviated. On the other hand, a re-gional analysis raises issues of cross-sectional dependence, either due to common nation-wide and/or regional factors or to spatial proximity.We analyze the form of cross-sectional dependence in different ways: we employ the CDtest for global cross-sectional dependence by Pesaran (2004) both as a test and informallyas a descriptive statistic; we apply an adaptation to irregular lattices of the CD(p) test forlocal cross-sectional dependence and we test for different orders of contiguity. We alsoemploy panel versions of the standard diagnostics for spatial dependence (Anselin 1988)and recent joint and marginal tests for random effects and serial-spatial correlation (Baltagi,Song, Jung and Koh 2007).We explore the possibility of a characterization of sectional dependence based on geo-graphic proximity through random effects panel models including combinations of spatiallags, spatial errors and serial dependence (Case 1991, Elhorst 2003, Baltagi et al., cit.),which we estimate by maximum likelihood through new procedures written in the R lan-guage.

Time Series Model for Paired ComparisonsM.R. Sjolander and I.N. Litvine (Nelson Mandela Metropolitan University, South Africa;[email protected], [email protected])

This study introduces time series models for paired comparisons. Paired comparisons areused in Psychology, Economics, Sport Statistics and other applied fields (Litvine, 2004).However such models for time series analysis had not yet been developed, while there aremany practical applications where they can be of significant importance.

60 Applied Statistics 2008

Page 61: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Econometrics Tuesday, September 23

In the models we are offering, the characteristics of the objects compared are assumed to befunctions of time (e.g. linear function, sinusoidal function, etc.). The basic models whichwe generalise to time-dependent kind are both traditional (like Bradley-Terry Model) andrecently developed (e.g. Haines-Litvine Model). Some results are in form of theorems withformal proofs, while some other problems are solved using computer software (Mathemat-ica). The performance of the new models is also verified through the computer modelling.This report also provides and discusses a number of examples of applications of the newmodels in econometrics (financial analysis), filling missing data in weather records, sportsstatistics, etc. We also support our findings by examples based on simulated data.ReferencesLitvine, I.N. (2004). Models and Methods of Paired Comparisons. Publishing House Nau-tilus, Lviv.

Prediction of Turkish Bank Failures via Multivariate Statistical Analysis of FinancialStructuresYuksel Akay Unvan (Export-Import Bank of Turkey, Ankara, Turkey; [email protected])

The objective of this paper is to propose a methodological framework for constructing theintegrated early warning system (IEWS) that can be used as a decision support tool in bankexamination and supervision process for detection of banks, which are experiencing seri-ous problems. Sample of the study consists of 85 Turkish banks (38 banks failed duringthe period 2000-2008) and contains their financial ratios including capital adequacy, assetsquality, liquidty, profitability, income-expense structure. By applying principal componentanalysis to financial data, the important financial factors were explicitly explored, and thefinancial factor components were determined. Factor scores were estimated for each of thebank with respect to the factors determined, and these scores were used as independentvariables in estimating discriminant and logistic regression models. Finally, these para-metric models combined together to construct IEWS, which have high predictive ability todifferentiate sound banks and troubled ones.

ReferencesAgresti, A. (2002), Categorical Data Analysis, Wiley Series in Probability and Statistics,London.Kolari, J., Glenon, D., Shin, H., Caputo, M. (2002). Predicting Large U.S. CommercialBank Failures Journal of Economics and Business, 54, 361–387.Mcleod, R. 2004. Dealing with Bank System Failure: Indonesia, 1997–2003, Bulletin ofIndonesian Economic Studies, 40, 1, 95–116.Poon, W.P., Firth, M., Fung, H. (1999), A Multivariate Analysis of the Determinants ofMoody’s Bank Financial Strength Ratings, Journal of International Financial Markets, In-stitutions and Money, 9, 267–283.

Applied Statistics 2008 61

Page 62: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Tuesday, September 23 Econometrics

Weelock, D.C., Wilson, P.W. (2000), Why Do Banks Disappear? The Determinants ofU.S.Bank Failures and Acquisitions, Review of Economics and Statistics, 77, 689–700.

62 Applied Statistics 2008

Page 63: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Data Mining I Tuesday, September 23

Data Mining IPrincipal Ellipsoid AnalysisThierry Dhorne (Lab-STICC, Universite de Bretagne-Sud and Universite Europeenne deBretagne, France; [email protected])

In Applied Statistics 2007 Conference a formal approach of redundancy among multivariatedata has been presented (Dhorne, 2007). This approach leeds to information measures ofmultivariate tables different with those usually used and particularly with the trace criterionof Principal Components Analysis but rather consistent with some interesting proposals(Shimansky, 2000).A new dimensional reduction method consistent with the former formal approach is pre-sented in order to provide a consistent way to analyse multivariate data and to get somevaluable representations. It is called “principal Ellipsoid Analysis” because it is connectedwith the “minimum spanning (or enclosing) ellipsoid”.The comparison of this method with classical Principal Component Analysis is detailedand a real data set is worked on to appreciate the differences from an applied point of view.The link between the method and a new kind of “robustness” is highlighted and some ex-tensions are evocated.

ReferencesDhorne T. (2007). Redundancy Measures for Multivariate Data. Applied Statistics Confer-ence - September 2007, Ribno (Bled)Shimansky, Y., (2000). Continuous measure of significant linear dimensionality of a wave-form set. Computational Statistics & Data Analysis, 35 (1),1-10.

Modelling Traffic Flow on Network of Slovenian RoadsTine Porenta, Kurt Kalcher, Franc Svegl and Igor Grabec (Amanova doo, Technology ParkLjubljana, Ljubljana, Slovenia; [email protected])

Road traffic is a consequence of population activity that is caused by numerous agents.Consequently, practically random character of traffic flow could be expected. In oppositionto this expectation, records of traffic flow exhibit rather regular, nearly periodic properties.The regularity is a consequence of highly synchronized population activity. The synchro-nization mainly stems from regularly changing illumination of the Earth that is describedby a clock and calendar. Beside this, an additional synchronization is generated inherentlyin the population by a consensus about working days and holidays. In agreement with this,we consider the road traffic flow as a non-autonomous dynamic phenomenon that is syn-chronized by variables representing hour and character of days. Its dynamics is modellednon-parametrically based upon recorded traffic flow time series. We further demonstrate

Applied Statistics 2008 63

Page 64: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Tuesday, September 23 Data Mining I

how this model could be applied for rather accurate forecasting of traffic flow rate on roadsnetworks of various countries. The forecasting provides applicable data for planning andsearching of optimal routes and periods of travelling.

Addition of Documents’ Representations in the Latent Semantic SpaceJasminka Dobsa (Faculty of Organization and Informatics, University of Zagreb, Zagreb,Croatia; [email protected])

The most used model for representation of textual documents for the purpose of informa-tion retrieval is vector space model or bag of words representation. In this model documentsare represented as the columns of term-document matrix, while terms used for indexing arerepresented as columns of that matrix. Such a representation neglects relations betweenthe terms and, as a result, the presence of synonims and polysemy in the documents areobstacles for efficient information retrieval. The methods of dimensionallity reduction ofthe original representation of documents in the vector space model tend to overcome one ofthese problems: the problem of synonims. Latent sematic indexing (LSI) is the most popu-lar method of dimensionallity reduction. It projects original representations of documentson the left singular vectors of singular value decomposition (SVD) of the term-documentmatrix in the sense of least squares. In fact, LSI method is application of modificated prin-cipal component analysis for the purpose of anlysis of textual documents.Collections of documents very often are dinamical because new documents constantly areadded to collection. Vectors on which the projection is done in the process of dimensionreduction are constructed on the basis of representations of all documents in the collection,and computation of the new representations in the space of reduced dimension demandsrecomputation of SVD. In order to overcome that problem Barry and coworkers (1995)sugessted approximative representation of added documents by projections on existing leftsingular vectors. They also propose method for approximative representation of added in-deks terms.It seems natural to extend the list of the indeks terms used for indexing of collection whennew documetns are added to collection. For that reason it will be proposed modification ofapproximative representations of terms and documents by combination of these methods.The proposed modification will be tested on MEDLINE collection of documents. The listof index will be extended and representations of documents and singular vectors will be ex-tended in dimensions of newly added terms. It is shown that representation of documentsby extended list of index terms does not improve performance of information retrieval sig-nificantly.

64 Applied Statistics 2008

Page 65: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Data Mining I Tuesday, September 23

Mining Association Rules from Transactional Databases and Apriori Multiple Algo-rithmPredrag Stanisic and Savo Tomovic (University of Montenegro, Faculty of Matemathicsand Science, Montenegro; predrag [email protected], [email protected])

One of the most important data mining problem is mining association rules. The motiva-tion for discovering association rules has come from requirements to analyze large amountsof supermarket basket data. Such association rules relate various items, having a seman-tic that the presence of some items in a supermarket transaction implies the presence ofthe other ones in the same transaction. In this paper we consider discovering associationrules in large transaction databases. The problem of discovering association rules is de-composed into two subproblems: find large itemsets and generate association rules fromlarge itemsets, as it is proposed in [1] and [2]. The second subproblem is easier one and thecomplexity of discovering association rules is determined by the complexity of discoveringlarge itemsets. Large itemset is a set of items, which appear together in a sufficient numberof transactions. In this paper we propose improvements of one of the most famous algo-rithms for discovering association rules, the Apriori algorithm given in [2]. In the Apriorialgorithm large itemsets are discovered among candidate itemsets iteratively. It is impor-tant to generate as small number of candidate itemsets as possible, because it is necessaryto determine the support for each candidate itemset in each iteration. Also, the aim is toreduce the number of algorithm iterations, because in each iteration the whole database issearched. In the paper we present an original procedure for candidate generation, whichproduce less number of candidate itemsets and which is efficient than the appropriate pro-cedure of the Apriori algorithm from [2], because it demands less number of computeroperations. Besides, we consider several ways to reduce the number of I/O operations. Wealso present the experimental results of comparing the Apriori algorithm form [2] to itsmodification proposed in the paper.

Applied Statistics 2008 65

Page 66: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Tuesday, September 23 Statistical Applications - Economics II

Statistical Applications - Economics IIEvaluating the Usefulness of Information from ForecastsEric S. Lin (Dept. of Economics, National Tsing Hua University, Taiwan; [email protected])Ping-Hung Chou (Dept of Finance, Ching Yun University, Taiwan)Ta-Sheng Chou (Dept. of Economics, National Tsing Hua University, Taiwan)

The purpose of making forecasts is to help decision-making process. For one who relieson the forecast data to make decisions, one important question is whether the announcedforecast data is worth being used or not. That is, whether the accuracy of this forecast datais good enough such that one could get lower expected loss than ignoring the informationfrom this forecast? To answer these questions, we can not directly depend on usual crite-ria such as mean square error (MSE) or mean absolute error (MAE), etc. However, Ashley(1983) proposed a criteria to make a judgement on the usefulness of forecasts, and informedusers whether he or she would get a useful guide by the information in that forecast, i.e.,Ashley (1983) simply computed MSE(xt)/var(xt) is greater than one to evaluate the use-fulness of forecasts, where xt denotes a forecast of xt. However, there is lack of a formalstatistical test for the significance of MSE(xt)/var(xt) being greater than one.In this paper, we proposed a suitable method which is a regression-based testing procedurewith a basic model and an extended version under the Markov-switching framework. Tech-nically, to allow for flexibility to characterize all possibly information in a decision-makingenvironment, we adopt the Bayesian approach. The proposed algorithm estimate param-eters in the evaluating equation via a Gibbs-sampling, and uses the estimated posteriordistribution to test the usefulness of forcasts. The proposed testing procedure is evaluatedthrough a small-scale Monte-Carlo simulation. The estimated posterior odds ratio in oursimulation results accurately evaluates whether the forecast is useful.

Application of Multiple Correspondence Analysis in Business ResearchChristine Duller (Institute for Applied Statistics, Johannes Kepler University Linz, Linz,Austria; [email protected])

This presentation demonstrates an application of (multiple) correspondence analysis inbusiness research. Correspondence analysis is an explanatory data analytic technique andis used to identify systematic relations between categorical variables. It is related to theprincipal component analysis and the results provide information on the structure of cat-egorical variables similar to the results given by a principal component analysis in caseof continuous variables. Classical correspondence analysis is designed two-dimensional,whereas multiple correspondence analysis is an extension to more than two variables.After an introductory overview of the idea and the implementation in standard softwarepackages (SPSS, SAS, R) an example in recent business research project is presented. Cur-

66 Applied Statistics 2008

Page 67: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Statistical Applications - Economics II Tuesday, September 23

rent fields in business research are entrepreneurship and business success of newly createdenterprises, as well as business administration in family-owned enterprises. Many papersdealing with entrepreneurship and/or family-owned enterprises are based on empirical stud-ies. Most of theses studies are based on harmonized questionnaires with mainly categoricalvariables. Therefore dealing with categorical variables is compulsory in empirical businessresearch.

ReferencesFeldbauer-Durstmuller, Wimmer, Duller (2007): Controlling in osterreichischen Fami-lienunternehmen - dargestellt am Bundesland Oberosterreich. Zeitschrift fur Planung undUnternehmenssteuerung 18 (4), 427-443.Greenacre (2007): Correspondence Analysis in Practice. 2nd edition. Chapman andHall/CRC, Boca Raton, Fla.Greenacre and Blasius (Ed.) (2006): Multiple Correspondence Analysis and Related Meth-ods. Chapman and Hall/CRC, Boca Raton, Fla.Statistik Austria (2007): Erfolgsfaktoren Osterreichischer Jungunternehmen. Verlag Osterreich,Wien.

Data Collection in Fast-Growing Companies: The Case of Slovenian GazellesMojca Bavdaz and Mateja Drnovsek (Faculty of Economics, University of Ljubljana, Ljubl-jana, Slovenia; [email protected], [email protected])

Academic business surveys generally suffer from low response rates and potential non-response bias. The situation is not surprising considering the fact that even governmen-tal mandatory business surveys face constant business complaints and increasing non-response. Achieving response is most problematic in small and medium-sized enterprises(SMEs) because their burden of responding to a survey is relatively higher compared tolarge enterprises. In addition, the contribution of these enterprises to aggregate figures isrelatively small compared to large enterprises. As a result, SMEs are often excluded fromthe sampling frame and not surveyed at all although fast-growing SMEs may quickly be-come big players. Experiences with effectiveness of data collection methods are thus evenmore valuable for this segment of business population.Our research will address an academic business survey of 500 Slovenian gazelles, i.e. com-panies which were characterised by fast growth in sales over a five-year period. The com-panies are mainly small and medium-sized. They generally face huge shortage of humanresources considering their fast growth so their perceived and actual response burden maybe high. We expect that high non-response will be the major problem.The preparations of survey design and survey questionnaire are well under way so that theimplementation will take place in the summer of 2008. The survey will offer a good op-portunity to check the effect of contact strategies on the response. One group of gazelles

Applied Statistics 2008 67

Page 68: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Wednesday, September 24 Statistical Applications - Economics II

(n=60) will be interviewed by journalists before the survey implementation and then inten-sively contacted in line with Dillman’s Tailored Design Method. This group is expected tobe more motivated to participate in the survey because of their higher involvement. Theremaining gazelles will be divided into two groups (n=220 each) and only one will be in-tensively contacted in the same way as the interviewed group.Research will thus address the challenges of data collection from highly overloaded fast-growing companies and the effectiveness of high involvement and intensive contact strategyfor increasing response rates.

Incentives for Industry-Science CollaborationLavoslav Caklovic (PMF Matematiki odjel, Zagreb, Croatia; [email protected])Sonja Radas (The Institute of Economics, Zagreb, Croatia)

Industry-science relationship is considered to be one of the crucial parts of an innovationsystem because of its positive impact on innovation and commercial performance. Extantstudies show that a significant proportion of the products and processes that are currentlysold and used could not have been developed without academic research.In order to foster collaboration between companies and scientists, the right set of incentivesis needed. The purpose of the incentives is to ”make it easy” for the players to efficientlystart and carry out the cooperation which ends in a commercially viable result. Howeverthe degree to which any chosen incentive is accepted by industry will depend among otherthings on the intensity of firm’s existing cooperation, on importance of innovation for thefirm, and on the level of market support from investors and demand from clients/customers.In this paper we examine a potential set of incentives and explore the degree to which theywould be accepted by the industry. We use Potential method (a clone of AHP) to elicitfirm’s preferences for given incentives and we seek to explain the difference in these pref-erences by the above firm/market factors. The paper is based on a survey of 190 Croatianenterprises performed in 2002.

68 Applied Statistics 2008

Page 69: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Clustering Wednesday, September 24

ClusteringCluster Analysis of Phytoplankton with Similar Temporal and Spatial PatternsSangdao Wongsai and Kehui Luo (Department of Statistics, Macquarie University, Sydney,Australia; [email protected], [email protected])

Cluster analysis is a multivariate analysis method that is widely used in ecological studiesfor data reduction. In this study, we aim to assign an assemblage of phytoplankton intoclusters with dissimilar temporal and spatial patterns, based on eight-year data collected atthe major reservoirs (Lake Yarrunga, Fitzroy Falls reservoir and Wingecarribee reservoir) inNew South Wales, Australia. Principal component analysis was used to estimate orthogonalcomponents for the temporal and spatial variability of phytoplankton community structure.It turned out that the first four components accounted for about 72% of the total variation.The interaction effects between phytoplankton and these four principal components werethen estimated using a multiplicative regression model. Subsequently, a dissimilarity ma-trix is formed, containing the squared Euclidean distances between the estimated modelparameters for every pair of phytoplankton studied. The significance of dissimilarity be-tween clusters is further evaluated using a chi-squared test. The results indicated that thephytoplankton community composition and abundance were associated with the temporaland spatial variability. Cyanobacteria were the most dominant group of phytoplankton in allthe reservoirs studied, and their cell concentrations were relatively abundant in the FitzroyFalls reservoir and Wingecarribee reservoir. Cryptomonads and golden-brown algae werepatchy in the Lake Yarrunga, while green algae and diatoms were observed predominantlyin the Fitzroy Falls reservoir and Wingecarribee reservoir.

Agglomerative Hierarchical Methods: Introduction and Problem SettingKristijan Breznik, Branka Golob and Mojca Cizek Sajko (Postgraduate Study Programmein Statistics, University of Ljubljana, Ljubljana, Slovenia; [email protected],[email protected], [email protected])

General problem, how to group units into clusters so that those within a cluster are as sim-ilar to each other as possible and units in different clusters are as dissimilar as possible,is very practical problem. Some of the most known and frequently used methods are hi-erarhical methods, because they are intuitively simple and the result can be graphicallyrepresented by a clustering tree named also a dendrogram.The algorithm of the agglomerative hierarchical methods produces a series of partitions ofthe units Pn, Pn−1, . . . , P1. The first Pn consists of n single unit ’clusters’, the last P1,consists of single group containing all n units. At each particular stage the method joinstogether the two clusters which are closest together (most similar).In the presentation several agglomerative methods will be described: single linkage cluster-

Applied Statistics 2008 69

Page 70: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Wednesday, September 24 Clustering

ing, complete linkage clustering, Gower median method, centroid clustering, Ward methodand average method. These methods are included in almost each of the serious statisti-cal software. Differences between methods arise because of the different ways of defining(dis)similarity between clusters.The use of the different hierarchical methods may generate different results. Performingmultiple experiments and comparing the results is recommended to support the veracityof the results. We will compare diferent methods and discuss some of their properties,like ’greediness’ and monotony. Also, some known facts about effectiveness of mentionedmethods to find different types of structures will be presented and a design to study someother structures will be given.

ReferencesDoreian, P., V. Batagelj and A. Ferligoj. 2005. Generalized Blockmodeling. Cambridgeuniversity press.Ferligoj A. 1989. Razvranje v skupine. Zbirka metodosloki zvezki, t. 4. Ljubljana.Raziskovalni institut, Fakulteta za sociologijo, politicne vede in novinarstvo.Gower J. C. 1967. A comparison of some methods of cluster analysis. Biometrics.Jain A. K., R. C. Dubes. 1988. Algorithms for clustering data. Englewood Cliffs. NewYork.Sneath PHA. 1969. Evaluation of clustering methods. Numerical taxonomy. AcademicPress. London.Sneath PHA, R. R. Sokal. 1973. Numerical taxonomy: The principles and practise of nu-meral classification. WH Freeman and Company. San Francisco.

Simulated Data StructuresRok Blagus, Nusa Erman and Emil Polajnar (Postgraduate Study Programme in Statistics,University of Ljubljana, Ljubljana, Slovenia; [email protected], [email protected],[email protected])

To study the performance of selected hierarchical agglomerative methods several data struc-tures were simulated in R. On the one hand five types of three-dimensional data struc-ture were selected: spherical, ellipsoid, core-and-crust, intertwined and chain-like three-dimensional data structures. On the other hand several degrees of variability of units inclusters were applied. In the presentation the simulation of data will be presented.

70 Applied Statistics 2008

Page 71: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Clustering Wednesday, September 24

The Performance of Selected Hierarchical Agglomerative MethodsAles Korosec, Sanja Filipic (Postgraduate Study Programme in Statistics, University ofLjubljana, Ljubljana, Slovenia; [email protected], [email protected])Tina Ostrez (Statistical Office of the Republic of Slovenia, Ljubljana, Slovenia;[email protected])Jana Suklan (Postgraduate Study Programme in Statistics, University of Ljubljana, Ljubl-jana, Slovenia; [email protected])

A broad variety of different methods of agglomerative hierarchical clustering brings alongproblems how to choose the most appropriate method for the given data. It is well knownthat some methods outperform others if the analysed data have a specific shape. In thepresented study we observed the behaviour of the centroid, the median (Gower medianmethod) and the average method (unweighted pair-group method with arithmetic meanUPGMA; average linkage between groups) and compared them with the minimum (singlelinkage clustering), the maximum (complete linkage clustering), the Ward and the Mc-Quitty (groups method average, weighted pair-group method using arithmetic averages -WPGMA) method of hierarchical clustering. We compared these methods on spherical,ellipsoid, core-and-crust, intertwined and chain-like three-dimensional data structures. Rstatistical software was used for data generation and throughout the analysis. Results showthat all seven methods are successful in finding compact, ball-shaped or ellipsoid structureswhen they are enough separated. Conversely, all methods except minimum perform pooron non-homogenous, irregular and elongated ones. Especially challenging was a circulardouble helix structure, which was correctly separated in two strands only by the minimummethod. We can also conclude that we can confirm already published results from oldersimulation studies, which usually favour average method (besides Ward method) if data areassumed to be fairly compact and well-separated.

Applied Statistics 2008 71

Page 72: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Wednesday, September 24 Teaching Statistics

Teaching StatisticsImprovements in Teaching Statistics in Slovenian Secondary SchoolsAndreja Drobnic Vidic (Faculty of Mathematics and Physics, University of Ljubljana,Ljubljana, Slovenia; [email protected])Simona Pustavrh (Solski center Novo mesto, Srednja elektro sola in tehniska gimnazija,Novo mesto, Slovenia; [email protected])

Mathematics teachers who also teach statistics as a part of mathematics as a compulsorysubject in Slovenian secondary schools face the similar problems to those mentioned atICOTS 2006, namely a lack of time for learning statistical contents at school (1), incom-plete statistics education of mathematics teachers (2), lack of curriculum guidelines (3).Nevertheless our curriculum of mathematics and the assessment system do not explicitlypoint out the need of statistical reasoning and thinking, often seen as the most importantelements in statistics education at all levels of statistics teaching (Kader and Perry, 2006;Chance, 2002). On the other hand, the aims of mathematics curriculum 2008 are veryambitious in terms of statistics meeting interdisciplinary challenges and requiring teacherscooperation across curriculum subjects. Therefore, teachers need a good guidelines andconcrete examples how to teach statistics efficiently to meet curriculum requirements andto give students a good statistics knowledge.We provide guidelines for teachers in secondary schools with concrete interdisciplinary realstatistical problems designed for students at secondary schools. These statistical problemsconnect statistics with other curriculum disciplinary subjects, such as sociology, ecology,and ICT. In guidelines for teachers we suggest time frame, programs and instructions forICT teachers, the way of cooperation and possible way of realization of the course. Wealso emphasize necessary statistical thinking and reasoning students need to accept to solvethese problems, not included in our current curriculum and assessment, however includedin the goals of NTCM standards. We hope that nowadays statistics in Slovenian secondaryschools could meet the challenges for tertiary education and for youth to become active,reflective and critical citizens.

ReferencesDrobnic and Vidic, A. (2006). A model for teaching basic engineering statistics in Slove-nia. Metodoloski zvezki, 3(1): 163-183.Garfield, J. (2002). The challenge of developing statistical reasoning. Journal of StatisticsEducation, 10(3). (http://www.amstat.org/publications/jse/v10n3/garfield.html)Heaton, R. and Mickelson, W. (2002). The learning and teaching of statistical investigationin theaching and teacher education. Journal of Mathematics Teacher Education, 5: 35-59.Huerta, J.A., Cavazos, J.E. and J.A. Lopez Esquivel (2006). Learning-based on real contextproblems and notions of probabilty distibutions and expected value. In A. Rossman and B.Chance (Eds.). Proceedings of the ICOTS-7, Salvador, Bahia, Brazil.(http://www.stat.auckland.ac.nz/simiase/publications/17/2C3 ALBE.pdf)

72 Applied Statistics 2008

Page 73: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Teaching Statistics Wednesday, September 24

Kader, G. and Perry, M. (2006). A framework for teaching statistics within the K-12 cur-riculum. In A. Rossman &. B. Chance (Eds.). Proceedings of the ICOTS-7, Salvador,Bahia, Brazil(http://www.stat.auckland.ac.nz/∼iase/publications/17/2B3 KADE.pdf )Leavy, A. and OLoughlin, N. (2006). Presrevice teachers understanding of the mean: mov-ing beyond the aritmetic average. Journal of Mathematics Teacher Eduaction, 9: 53-90.Rossman, A., Medina, E. and Chance, B. (2006). A post-calculus introduction to statis-tics for future secondary teachers. In A. Rossman &. B. Chance (Eds.). Proceedings of theICOTS-7, Salvador, Bahia, Brazil. (http://www.ugr.es/∼icmi/iase study/Sample%20paper1.pdf)Watson, J. and Moritz, J. (1999). The beginning of statistical inference: Comparing twodata sets. Educational Studies in Mathematics, 37: 145-168.

Statistics101: an Extended Implementation of the Resampling Stats LanguageJohn Grosberg (www.statistics101.net, Scotsdale, Arizona, U.S.A.; [email protected])Gaj Vidmar (Institute for Rehabiltation, Republic of Slovenia, Ljubljana, Slovenia;[email protected])

Statistics101 is a giftware computer program written by John Grosberg that interprets andexecutes the simple but powerful Resampling Stats programming language. The originalResampling Stats language and computer program were developed by Julian Simon andPeter Bruce as a new way to teach statistics. The history, description, and application ofthe resampling method to a vast range of statistical problems are described fully in Simon’sbook Resampling: The New Statistics, which is freely available in electronic format on theinternet.Anyone wanting to learn statistics will find that the resampling approach helps in under-standing statistical concepts from the simplest to the most difficult. In addition, profes-sionals who want to use resampling, bootsrapping, or Monte Carlo simulations will findStatistics101 of use.Using Statistics101, students can learn probability and statistics the easy way, i.e., by sim-ulation, gain deeper understanding of traditional statistics concepts and methods, increasetheir awareness of the role of variability in probability and statistics, and learn and applysimple to very sophisticated statistical techniques without tables or complicated formulas.In this way, Statistics101 complements traditional statistics classes.The Statistics101 program is also suitable for higher levels of statistical sophistication. Ithas been used by professionals and researchers in many fields, including anthropology,ecology, evolutionary and marine biology, epidemiology, psychology, toxicology and vet-erinary pathology.Statistics101 is written in Java and will run on Windows, Mac, Unix, Linux, or any otherplatform that supports Java 1.4. The program, tutorials and and manual can be downloaded

Applied Statistics 2008 73

Page 74: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Wednesday, September 24 Teaching Statistics

from www.statistics101.net.

How to Research the Effectiveness of Constructivist Statistics Education? An Ap-proach based on Reproducible Computing.Patrick Wessa (Lessius Dept. of Business Studies, Integrated Faculty of Business and Eco-nomics, KULeuven Association, Belgium; [email protected])

The main purpose of this paper is to explain how educators and scientists can use a newlydeveloped computing platform1 that is made available to the academic community, free ofcharge2. It is shown how educators and students can use the system to create an electronicdocument (the so-called Compendium) that allows any reader to reproduce, and reuse theunderlying statistical computations. In addition, it is illustrated how the Compendium al-lows anyone to easily verify and challenge the results that are presented.The underlying technology offers new ways of building true “constructivist learning envi-ronments” where students are empowered to interact, experiment, communicate, and col-laborate - even if the student population is very large. More importantly, this platformallows educators and researchers to accurately measure important aspects of the actuallearning process which are otherwise unobservable.With this new information it is possible to explore (and investigate) the effectiveness ofe-based statistics learning, the impact of (statistical) software usability, and the importanceof knowledge construction through various feedback and communication mechanisms. Forexample, it is possible to accurately measure how statistical computations are used by stu-dents to: complete assignments; verify results from other students or from the educator;provide feedback to other students; ask for help or report a problem; perform empiricalresearch (term paper, thesis, ...); etc...The empirical evidence that was obtained so far, clearly suggests that constructivism (basedon Compendia of reproducible research) is strongly related to exam scores, in which thequestions objectively assess the student’s understanding of statistical concepts.Finally, it is shown that the computing platform provides valuable information about howthe quality of statistics education can be improved in terms of learning effectiveness.

1Based on two websites: http://www.freestatistics.org and http://www.wessa.net2This research is funded by the OOF: project 2007/13.

74 Applied Statistics 2008

Page 75: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Teaching Statistics Wednesday, September 24

Generating Tests Using R and LATEXLara Lusa (Institute of Biomedical Informatics, University of Ljubljana, Ljubljana, Slove-nia; [email protected])

Cheating on tests is a common practice. Students that can easily cheat on tests, not onlyget unearned grades, but are also less likely to get motivated in class. In our experiencewith the introductory class of statistics at Medical School, students can use their notes andbooks during the test; this eliminates the widespread problem of cheat sheets but it is stillvery common to catch students trying to copy from each other.To try to prevent copying on tests, we developed an R program - genertest - for ran-domly generating unique tests for each student. The questions included in the tests aredrawn from a database in which the topic, the number of points and the difficulty level isspecified for each question. Using genertest, among other things, the user can specify:how many different tests should be generated, which are the topics to include (togetherwith the number of points per topic and difficulty level), which is the minimum number ofconsecutive tests that should not include the same questions. The tests can be customizedproviding the name of the course and the date of the exam. Other options are available,such as the permutation of the answers of multiple-choice questions or the generation ofthe solutions of the tests. The output of the program is a LATEXfile containing the code forthe text of the exam.An interesting feature for instructors of statistics is the possibility of generating tests inwhich the numbers included in the problems are generated randomly. This is easily achievedusing Sweave notation in the questions. Figures and tables can be also included in the tests.To make the program available also to users not familiar with R or LATEX, we plan to makethe program available through a web-interface.

Applied Statistics 2008 75

Page 76: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Wednesday, September 24 Measurement

MeasurementInclusion of Capital Services to the Productivity MeasurementJaroslav Sixta and Jakub Fischer (Faculty of Informatics and Statistics, University of Eco-nomics, Prague, Czech Republic; [email protected], [email protected])

National accounts represent one of the most popular data source for economic and sta-tistical analyses. National accounts’ figures are also used for productivity measurement -capital stocks and hours worked or number of employees. But in the revised System of Na-tional Accounts (SNA), the concept of capital services will be introduced (as supplementaryand voluntary issue) and it means that the new approach to the productivity measurementwould be possible. Similar areas are the subjects of many research projects in recent time.From our point of view, the most important change in productivity measurement is the shiftfrom capital stocks to capital services. Actually it means that capital stocks represent astock variable and capital services represent a flow one. Due to this fact, capital servicesshould be more appropriate tool of productivity measurement in comparison with otherflow variables like output and hours worked. Capital stocks and capital services are basedon statistical models. The key assumption is that the service life is a random variable withexpected distribution. Our paper is focused on the empirical analysis of this area.

Quality of the Measurement of Media Use on Political Issues in the ESSLluıs Coromina and Willem E. Saris (Survey Research Centre, ESADE Business School,Universitat Ramon Llull, Barcelona, Spain; [email protected], [email protected])

Given the importance of the media in all societies the Central Coordinating Team of theEuropean Social Survey (ESS) developed and proposed a module for media use which isused in the core questionnaire of the ESS. This module allows the measurement of the totaltime spent on the traditional media (television, radio and newspapers), the total time spenton political issues and current affairs in the media and the total time spent for other pur-poses in the media.The questions asked are: How are these concepts operationalized; Can these measures becompared across countries; How should we compute the total time; How good are thesemeasures and do we need these aggregated variables or should we rely on the separatemeasures of the use of the different media ?The quality for these three measures is evaluated for the participant countries in three ESSwaves. Structural equation modelling has been use to obtain the quality of these measures.Next, media use variables are related with political variables such as political discussion,political participation and political knowledge in order to test its external validity.Results obtained for the media use model, using structural equation model, show that totaltime spent on political issues in the media and total time spent on other purposes in the me-

76 Applied Statistics 2008

Page 77: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Measurement Wednesday, September 24

dia, despite some additional effects, can be interpretable and explaining the behaviour ofpolitical variables. While using total time spent on the traditional media, which is the sumof the other two, makes no sense, it cannot be used for determine the effects on politicalissues because it is a mixed of different interests or purposes.Therefore, for media research we would suggest using the different types of media vari-ables instead a global measure of total time in the media.

Achieving Cross-National Equivalence in Survey Measurement of Tourist Satisfac-tion: Methodological Challenges and an Empirical InvestigationVesna Zabkar, Irena Ograjensek, Tanja Dmitrovic and Maja Makovec Brencic (Faculty ofEconomics, University of Ljubljana, Ljubljana, Slovenia; [email protected],[email protected], [email protected], [email protected])

Tourist destinations can be defined as bundles of tourism products and services consumedunder the same brand name, thus offering the consumer an integrated experience. Tradi-tionally, destinations are described as well-defined geographical areas such as countries, is-lands, or towns, which their visitors perceive as a unique entity. One of the challenges facedby destination managers is branding and global positioning of destinations. There existsa widespread agreement that understanding customer satisfaction at the destination levelcould be the key to achieving destinations competitive advantage. However, an overviewof the literature shows that as a theoretical construct, customer satisfaction is not easy todefine and operationalize. Furthermore, the survey instrument used to measure customersatisfaction in the tourism setting has to exhibit adequate cross-national equivalence. Theestablishment of measurement invariance across groups of tourists from different coun-tries is thus a necessary prerequisite to conducting meaningful substantive comparisonsand draw conclusions which are both relevant to destination’s managers and practically ap-plicable.In this paper we explore the cross-national equivalence of a survey used to measure cus-tomer satisfaction and two of its most closely related theoretical constructs (perceived ser-vice quality and customer loyalty) on a sample of adult respondents (foreign and domestictourists) at selected tourist destinations in Slovenia. Issues pertaining to survey designand testing in a multinational setting are dealt with systematically along with the practicaldemonstration of strengths and limitations of the multi-group CFA approach to testing formeasurement invariance.

Measuring the Dynamics of the Digital Divide: an Integrative ApproachVesna Dolnicar and Vasja Vehovar (Faculty of Social Sciences, University of Ljubljana,Ljubljana, Slovenia; [email protected], [email protected])

Applied Statistics 2008 77

Page 78: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Wednesday, September 24 Measurement

The basic digital divide refers to socio-demographic differences related to information andcommunication technologies (ICTs). These differences are most often observed as a com-parison of simple percentage figures related to the penetration of technology among dif-ferent countries and socio-demographic segments. Since these comparisons are usuallyperformed in a time perspective, observing the benchmarks can be extremely problematicas straightforward comparisons of percentages may not suffice in a rapidly changing envi-ronment.Since existing digital divide studies are mostly limited to analyses of simple differences,which results in oversimplified or even disputable conclusions, the paper presents the ap-plication of a novel method to digital divide problems. The time distance method or S-time-distance (e.g. Sicherl, 1973, 2005) is an increasingly used approach in the field of thesocial sciences because it allows the understanding of an additional view over the complextime processes. S-time-distance measures the distance between two points in time, whenunits under comparison reach a definite level of a certain indicator. Thus in the S-time-distance terminology it would be said that group A lags behind group B, for example, by 3years and with this approach the lag in penetration (e.g. 35% vs. 40%) is set in a differentperspective.The paper will illustrate that, when analysing the dynamics of the digital divide, the an-swer to the seemingly simple question Is the digital divide increasing, decreasing, or isit constant? is not straightforward. An integral methodological tool that comprehensivelyaddresses this question will be introduced. This methodological approach is based on theassumption that none of the existing statistical measures truly communicates the essenceof a certain digital divide phenomenon (absolute measures, relative measures and S-time-distance are considered). In addition, even the simultaneous reporting of all three measuresis insufficient. To monitor and interpret the dynamics of the digital divide it is thereforevery important to explicitly take into account future scenarios of ICT diffusion among theobserved subjects (e.g. population segments, countries). We have developed these scenar-ios within the broad framework of the diffusion theory (Rogers, 1962, 2003), but with adistance from two of its implicit assumptions related to the deterministic conceptualisationof the diffusion process: the form of the diffusion function and the anticipated level ofthe final penetration rate. It is argued that the proper measure can only be provided if weanticipate and take into account the full distribution functions of the compared subjects orpopulation segments and the location of the subject at a certain point in time.

78 Applied Statistics 2008

Page 79: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Network Analysis II and Data Mining II Wednesday, September 24

Network Analysis II and Data Mining IIStability of Typologies on the Basis of Repeated Measurement with the Role Rela-tion(ship) and the Name Generator ApproachTina Kogovsek (Faculty of Arts, University of Ljubljana, Ljubljana, Slovenia;[email protected])Valentina Hlebec (Faculty of Social Sciences, University of Ljubljana, Ljubljana, Slovenia;[email protected])

In measuring ego-centered social networks, two general approaches can be distinguished.A very simple way to evaluate membership in a social network is to ask an ordinary surveyquestion where response categories are types of relationships (e.g. partner, parents, chil-dren, friends, etc.). This approach (usually called the role relation(ship) approach) is veryappealing as it saves time and money. However, information obtained by this approach isvery limited.Most often, when evaluating ego-centered networks, the name generator approach is used.The list of egos (respondents) is obtained in the first step. In the second step, existing tiesare identified - all alters with whom the focal ego has some sort of relationship. Whenall ties have been identified, the contents and the characteristics of ties are assessed. Inmost cases the characteristics of the alters are also measured. The name generator ap-proach yields more data and is also of higher quality. However, it is very time and moneyconsuming, and it requires either considerable effort from respondents, when it is appliedin self-administered mode, or complex coordination between interviewer and respondent,when it is applied in personal interviews (e.g. Kogovsek et al., 2002).In a series of studies (e.g., Hlebec and Kogovsek, 2005; Kogovsek and Hlebec, 2005; Ko-govsek and Hlebec, 2008), network composition was estimated using both approaches.Test-retest and split-ballot experiments on convenience samples of respondents were usedto assess the stability of network composition. Findings show that with some caution, thetwo approaches are comparable. In the present paper this line of research is taken a stepfurther. Typologies of social support networks are produced by hierarchical clustering onthe basis of network composition, estimated by both approaches. Overall stability of ty-pologies as well as stability of clustering of individual respondents are studied.

Cancer Classification Using Various Partitioning Clustering TechniquesParvesh Kumar and Siri Krishan Wasan (Department of mathematics, Jamia Milia Islamia,Delhi, India; parvesh [email protected], [email protected])

Data mining is a search for relationship and patterns that exist in large database. Accordingto Usama Fayyad, Data mining is a step in the KDD (Knowledge Discovery in Databases)process that consists of applying data analysis and discovery algorithms that produce a par-

Applied Statistics 2008 79

Page 80: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Wednesday, September 24 Network Analysis II and Data Mining II

ticular enumeration of patterns (or models) over the data.Clustering is an important data mining technique. A mathematical definition of clusteringis the following: let X = {x1, x2, x3, . . . , xm−1, xm} be a set of data items representinga set of m points xi in Rn where xi = {xi1, xi2, xi3, . . . , xin} . The goal is to partition Xinto k-groups Ci : i = 1, 2, . . . , k such that data belong to the same group are more alikethan data in different groups. Each of the k-groups is called a cluster. The result of thealgorithm is an injective mapping of data items xi to groups Ck. Because of the complexityand the high dimensionality of gene expression data, classification of a disease samplesremains a challenge. Hierarchical clustering and partitioning clustering is used to identifypatterns of gene expression useful for classification of samples. The k-means, PAM andGlobal k-means algorithms are partitioning clustering algorithms. The kmeans algorithm(MacQueen, 1967) is a squared errorbased clustering algorithm. In k-means algorithm, theobjective function attempts to minimize the distance of each point from the cluster center towhich the point belongs. In case of PAM(Partitioning Around Medoids) (Kaufman, 1990),objective is to determine a representative object (medoid) for each cluster, that is, to findthe most centrally located objects within the clusters. Global k-means algorithm (Likas,2002) is an incremental approach to clustering that dynamically adds one cluster center ata time through a deterministic global search procedure consisting of N (with N being thesize of the data set) executions of the k-means algorithm from suitable initial positions. Inthis paper, we make a comparative study of partitioning methods namely k-means, globalk-means and PAM to classify the cancer dataset. Also study the performance of PAM withmax-min algorithm as initial selection method.

Absolute Maximum Entropy Principle and Self-Organization of Memory CellsIgor Grabec (Amanova doo, Technology Park, Ljubljana, Slovenia; [email protected])

Development of an automatic information processing system capable of statistical mod-eling of physical laws is treated. The system is comprised of an array of sensors and anetwork of memory cells called artificial neural network (ANN). The joint probability den-sity function (PDF) of signals from sensors is expressed by a kernel estimator based uponsamples of data. Since the number of samples can increase without limit, while the num-ber of memory cells is generally limited, an optimal representation of sensory data and aproper storage in the memory cells needs to be developed. For this purpose a new principleof absolute maximum entropy is formulated and applied to express a representative PDF interms of prototype data. The modeling of a specific physical law corresponds to a mappingof sensory signals to these prototype data. For this purpose the measure of discrepancy be-tween the PDF of sensory data and the representative one is minimized. The correspondingadaptation process leads to a self-organized, highly non-linear dynamics of memory cellsin an abstract multi–dimensional space. The dynamics is non-autonomous since it is driven

80 Applied Statistics 2008

Page 81: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Network Analysis II and Data Mining II Wednesday, September 24

by sensory data. The adaptation algorithm includes initialization and variation of prototypedata. Its stochastic perturbation treatment leads to an adaptation process that resemblesthe cooperation of neurons in biological neural networks. The response of the completenetwork is described by the excitation of neurons which causes self-organization and si-multaneously represents the encoded driving variable. During adaptation each neuron isselectively sensitized to one prototype from the sample space of the driving variable. Thecorresponding self-organization of neurons has been previously described by other authorsbased upon biological observations. In the presentation the self-organized formation ofprototype data in various ANNs is demonstrated.

Applied Statistics 2008 81

Page 82: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano
Page 83: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Index of AuthorsAlin, A, 51Arrioja-Rodrıguez, M, 52Ata, N, 37

Bacnali, S, 44Batagelj, V, 28Bavdaz, M, 67Belic, I, 29Berzelak, N, 22, 27Billiet, J, 19Blagus, R, 70Blejec, A, 58Bohacova, H, 50Borgan, Ø, 41Boscaiu, V, 48Brabec, M, 23Bratosin, D, 48Bregar, L, 27Bren, M, 21Breznik, K, 69Brizzi, M, 30Browning, S, 48Budsaba, K, 51Butt, NS, 59

Campobasso, F, 46Capo, A, 20Carmeci, G, 60Ceranka, B, 57Cerquetti, A, 54Chou, P-H, 66Chou, T-S, 66Cleveland, WS, 53Coenders, G, 20Coolen, F, 54Coolen-Schrijner, P, 54Coromina, L, 20, 76Cortese, G, 44Crossman, R, 54Caklovic, L, 68

Cizek Sajko, M, 69Cobanovic, K, 47

Demirhan, H, 44Demirhan, YP, 44Dhorne, T, 63Dıaz-Castellanos, E, 52Dıaz-Ramos, C, 52Dmitrovic, T, 77Dobsa, J, 64Dolnicar, V, 78Drnovsek, M, 67Drobnic, A, 72Duller, C, 66

Erman, N, 70Eskerod Madsen, B, 48

Fanizzi, A, 46Filipic, S, 71Firuzan, E, 32Fischer, J, 21, 76Flores-Avila, LC, 52Franco, M, 39

Geraci, M, 42Ghahfarokhi, MAB, 56Golob, B, 69Grabec, I, 63, 80Graczyk, M, 57Grosberg, J, 73Guia, J, 20Gunay, S, 38

Heckenbergerova, J, 50Heinze, G, 42Heyser-Fregoso, S, 52Hlebec, V, 36, 79Horvat, T, 22

Iravani, H, 56

83

Page 84: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Jakulin, A, 34

Kalcher, K, 63Karabey, U, 37Kejzar, N, 28Kosir, M, 34Kocer, UU, 32Kogovsek, T, 36, 79Konar, O, 23Korenjak-Cerne, S, 28Korosec, A, 71Kovacs, P, 47Kronegger, L, 34Kumar, P, 79

Lawson, AB, 42Lin, ES, 66Litvine, IN, 60Lozar Manfreda, K, 27Luo, K, 69Lusa, L, 75

Makovec Brencic, M, 77Malesic, K, 27Maly, M, 23Mazouch, P, 21Meglic, V, 58Millo, G, 60Minovic, J, 25Mitar, M, 29Mondol, DK, 32Mutavdzic, E, 24

Nicin, S, 47Nikolic-Djoric, E, 24

Ograjensek, I, 77Ostrez, T, 71Ozel, G, 55Ozkok, E, 37

Pelikan, E, 23Petres, T, 47Petrovic, U, 49

Phatthanangkul, T, 51Polajnar, E, 70Polat, E, 38Porenta, T, 63Posdarie, E, 24Pustavrh, S, 72

Radas, S, 68Rettaroli, R, 30Rode, N, 35Roli, G, 30Rostohar, K, 58Rovan, J, 27

Saris, WE, 76Scheike, TH, 44Schemper, M, 42Shahbaz, MQ, 59Sidoroff, M, 48Siriwan, P, 51Sixta, J, 76Sjolander, MR, 60Skulj, D, 54Sokolovska, V, 47Stanisic, P, 65Suklan, J, 71Skerjanc, J, 35Sustar Vozlic, J, 58Svegl, F, 63

Toktamis, O, 32Tomovic, S, 65Turkan, S, 32

Umek, L, 49Unvan, YA, 61

Vehovar, V, 22, 27, 34, 78Vidmar, G, 73Vivo, J-M, 39

Wakounig, S, 42Wangniwetkul, K, 58Wasan, K, 79

84 Applied Statistics 2008

Page 85: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

Wessa, P, 74Wongsai, S, 69

Yip, PSF, 32

Zupan, B, 49Zupanc, D, 21Zabkar, V, 77Ziberna, A, 34, 35

Applied Statistics 2008 85

Page 86: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

86 Applied Statistics 2008

Page 87: APPLIED STATISTICS 2008 - avcr.cz · Nejc Berzelak, Vasja Vehovar and Katja Lozar Manfreda (Slovenia) 2. Well-being in the Slovenian Municipalities ... Giovanni Millo and Gaetano

http://www.arrs.gov.si/en

http://www.valicon.si

http://www.alarix.si

http://www.result.si

http://www.wiley.com

http://www.jvank.nl/ARMHome/

http://www.statistics.com/

Applied Statistics 2008 87