pacific rim objective measurement symposium …proms.promsociety.org/2017/bookletv3.pdfpacific rim...

59
Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference Workshop Schedule (Venue: Faculty of Psychology and Education, UMS) Day 1 - August 5 th Time 8:00 Bus will pickup workshop participants from Klagan Regency 08:30 - 09:00 Registration/ Arrival of guests 09:00 - 10:30 Introduction to Rasch Measurement Model by Prof. Bond & Dr. Zali 10:30 - 10:45 Break 10:45 - 12:30 Workshop continue 12:30 - 1:30 Lunch 1:30 - 3:30 Workshop continue 3:30 - 3:45 Break 3:45 - 5:30 Workshop continue 5:30 Bus will pickup workshop participants to Klagan Regency Day 2 - August 6th 8:00 Bus will pickup workshop participants from Klagan Regency 08:30 - 09:00 Arrival of participants 09:00 - 10:30 Introduction to Rasch Measurement Model - 2nd Day Introduction to SEM and Rasch Measures by Prof. Bond & Dr. Zali Dr. Juliet Ling Mei Teng & Nor Irvoni Mohd Ishar 10:30 - 10:45 Break 10:45 - 12:30 Workshop continue Workshop continue 12:30 - 1:30 Lunch 1:30 - 3:30 Workshop continue Workshop continue 3:30 - 3:45 Break 3:45 - 5:30 Workshop continue Workshop continue 5:30 Bus will send workshop participants to Klagan Regency

Upload: trannhan

Post on 20-Mar-2018

228 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

1

Pre-Conference Workshop Schedule (Venue: Faculty of Psychology and Education, UMS)

Day 1 - August 5th

Time

8:00 Bus will pickup workshop participants from Klagan Regency

08:30 - 09:00 Registration/ Arrival of guests

09:00 - 10:30 Introduction to Rasch Measurement Model by Prof. Bond & Dr. Zali

10:30 - 10:45 Break

10:45 - 12:30 Workshop continue

12:30 - 1:30 Lunch

1:30 - 3:30 Workshop continue

3:30 - 3:45 Break

3:45 - 5:30 Workshop continue

5:30 Bus will pickup workshop participants to Klagan Regency

Day 2 - August 6th

8:00 Bus will pickup workshop participants from Klagan Regency

08:30 - 09:00 Arrival of participants

09:00 - 10:30 Introduction to Rasch Measurement Model - 2nd Day

Introduction to SEM and Rasch Measures

by Prof. Bond & Dr. Zali Dr. Juliet Ling Mei Teng & Nor Irvoni Mohd Ishar

10:30 - 10:45 Break

10:45 - 12:30 Workshop continue Workshop continue

12:30 - 1:30 Lunch

1:30 - 3:30 Workshop continue Workshop continue

3:30 - 3:45 Break

3:45 - 5:30 Workshop continue Workshop continue

5:30 Bus will send workshop participants to Klagan Regency

Page 2: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

2

Time

8:00

08:30 - 09:30

11:00 - 11:30

11:30 - 12:00

12:00- 12:30

12:30 - 1:30

KK_001 On the measure of attitude towards science: A scientific

model - Liu Huang

KK_009 Examining the Measurement Properties of Students'

Perceptions of Assessment Scale

- Wilham M. Hailaya

KK_002 Internet Banking Service Quality Measurement: A Scale

Development for Malaysian Banks - Mahgoub Elradi Ahmed

siddig

KK_004 Mathematics Item Quality: An Illustrative Example using

Rasch Measurement Model - Ling Mei-Teng

KK_013 Rasch-derived Measure for Assessing Student Competency in

University Introductory Computer Programming (CS1) - Leela

Waheed

KK_003 Face Validity Test on Validity and Reliability of ICT

Procurement Officer Competency Measurement Instrument

by Using Rasch Model - Azran Ahmad

KK_008 Multidimensional computerized adaptive testing for toddlers:

a developmental screening tool - Ying-Hsien Chien

KK_019 Rasch Analysis of the Malaysian Teachers’ Responses to the

Organizational Commitment Scale - Ahmad Zamri bin Khairani

KK_018 Reliability Testing Instruments for Computer Programming

Learning:

Applying the Rasch model - Azliza Yacob

KK_010 Development and validation of the students’ ability

questionnaire on science process skills - Ellyza Karim

KK_022 The use of MCQ as Formative Assessment to Reflect

Attainment of Desired Learning Outcome - Ximei Zhou

KK_006 Application of Multi-dimensional Computerized Adaptive Test

on Clinical Dementia Rating Scale using Computer-aided

Technique - Ting-En Hui

3:00 - 3:30

KK_014 Disciplinary Biases of a Student Evaluation of Teaching

Survey in Higher Education - Billy Wai Kei Chan

KK_029 Using Rasch Model for the Development of Intention to Stay

Scale (ITSS) among Medical Academics at Public Universities -

Wan Ismahanini Ismail

KK_011 We have equal intervals; now we need invariance:

The next important step in Rasch measurement - Prof

Trevor G Bond

KK_033 A Rasch Analysis of the Reading, Grammar, and Essay

Sections of a Japanese University Entrance Examination -

Kristy King Takagi

KK_030 Development of a Model of Positive L2 Self using the Rasch

Model - J. Lake

KK_017 Psychometric Features of Psychosocial Safety Climate

(PSC-12) - Rosnah Ismail

KK_038 Improving Teaching and Student Learning through Evaluation

of one TOEIC Preparation Textbook - YihYeh Pan

KK_031 Developing a Vocabulary Specification Equation for Second

Language Learners - J. Lake

KK_024 Validation of Medical Statistics Exam Paper: Conventional

method versus RASCH. - Azmi Mohd Tamil

KK_066 Develop, deploy, determine: Surveying assessment for

learning in the Singapore secondary school context -

Christopher C. Deneen

KK_032 Measuring the Validity and Reliability of Arabic Vocabulary

Knowledge Test Using Rasch Model Approach - Zunita

Mohamad Maskor

6:00 - 10:00

Conference Schedule

Opening show

Registration/ Arrival of guests

09:30 - 11:00

Opening Ceremony

Welcome Speech by Host Committee Chairman

Welcome Speech by PROMS Chairman

1:30 - 3:00

Parallel Session 1

Break

3:30 - 5:30

Conference housekeepings

Parallel Session 2

PROMS Dinner @ Kampung Nelayan

Keynote Speech - Prof. Margaret Wu

Group Photo

Lunch

Break

Chairperson: Prof YanziChairperson: Prof. Rob Cavanagh

Bus will pickup participants from Klagan Regency

Day 1 - August 7th

Track 1 (Education)

Chairperson: Prof. Stenner

Track 2 (Scale Development)

Chairperson:Dr. Haniza Yon

Track 3 (others)

Chairperson: Prof. Vincent Pang

Track 1 (Education)

Chairperson: Prof. Stenner

Track 2 (Scale Development) Track 3 (others)

Page 3: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

3

Time

8:00

08:30 - 09:00

9:00 - 9:40

9:40 - 10:15

KK_015 Gender DIF in Mathematics Items among Secondary Students in

a Coeducational Learning Environment - S.Kanageswari

Suppiah Shanmugam

KK_035 Assesing Pedagogical Content Knowledge of The Particle Theory

of matter and Phasa Change in Pre-service Science Teacher -

Maryati

KK_059 The Unreasonable Effectiveness of Theory Based Instrument

Calibration in the Natural Sciences: What Can the Behavioral

Sciences Learn? - Jackson Stenner

KK_016 Rasch Analysis Properties of Structural Items and Essay in

Chemistry Test - Adeline Leong Suk Yee

KK_036 Preliminary Report on the Development and Calibration of a Rasch

Scale to Measure Chinese Reading Comprehension Ability in

Singaporean 2nd Language Primary School Students, Part II -

Chung Tze Min

KK_012 Application of Rasch Model in Islamic Moral Value Scale for

Islamic Education Teachers (INSPI) in Malaysia - Salbiah bt

Mohamed Salleh @ Salleh

KK_020 Development and Validation of The Plagiarism Tendency among

Malaysian Post-Graduate Students. - Anis Jauharah Abd. Kadir

KK_039 Development of Instruments for Measuring Mathematical Logical

Thinking Ability College Students in Kapita Selekta - Novaliyosi

KK_025 The Effects of Chronic Daily Fears on Students’ Concept of Self:

Towards Identifying Students being Bullied - Rense Lange

KK_021 Predictors of self-assessment intention and practice among

primary and secondary students in Hong Kong - Zi Yan

KK_040 Assessing competency level among SIPartners+ using Rasch

Model approach - Hishamuddin Hashim

12:15 - 1:30

KK_026 Development of KKM-PPM Performance Instruments to Support

Comprehension, Social Skills, and Discipline Students of Sultan

Ageng Tirtayasa University - Nurul Anriani

KK_041 Validating Value Domain of the Facilitator Competency Profile

Instrument SIPartners+-2 (FCPI- SIPartners+-2) Using Rasch

Model Analysis - Raja Hamizah Raja Harun

KK_034 Applying Rasch Model To Identify A Contribution of Marital Status

in Perceived Social Support of Merapi Volcanic Eruption Mount

Survivors - Chandra C. A. Putri

KK_027 Development of a Diagnostic English Grammar Test for

Malaysian Lower Secondary School Students - Kho Chung Wei

KK_047 Global mindset: Assessing construct dimensionality - Jeffrey

Durand

KK_057 Analysing The Effect of Smart Partnership using Rasch – a

case of women entrepreneurs in Tanjung Karang - Rohani Mohd

KK_037 Multidimensional Rasch Analysis of Teaching Role-Specific

Esteem - Yu-Shu Chen

KK_049 Psychometrics Properties of the Tuckman Procrastination Scale in

an Indonesian sample - Ngadiman Djaja

KK_065 Validating the Usability Evaluation’s Instrument of Community

Learning Centre Model (UEICLC) for Aboriginal in Tasik Chini,

Pahang - Mazzlida Mat Deli

KK_064 Misconceptions in electricity via Rasch Analysis - Nazlinda Abdullah KK_005 Rasch person fit statistics associated with the weighted degree

indicators of Social Network Analysis - Tsair-Wei Chien

3:00 - 3:30

KK_042 Measuring Scientific Literacy: Using the Rasch Model Analysis to

Determine Student Competency Using Data from PISA 2015 -

Nor Azizi bt Abdullah

KK_054 Development and validation of a diagnostic pronunciation rating

scale: A rating scale and common-item equating analysis -

Yuanyue Hao

KK_062 Modernizing vs Ecologizing Approaches in Measurement -

William P. Fisher, Jr.

KK_043 Validating Knowledge Domain of Facilitator Competency Profile

Instrument – SISC+1 (FCPI-SISC+1) Using Rasch Model

- Zulkifili Salleh

KK_055 Development of instrument in measuring cottage industry

accounting practices. - Susana Narawi

KK_007 Using social network analysis to report Rasch papers’ keyword

development and association across years - Wei-Ru Jyun

KK_044 Measuring The Status Of Fasilinus Current Professional Profile

Using Rasch Model - Ruzita Ahmad

KK_058 Development And Validation Of Malaysian Secondary School - Ma

Chi Nan

KK_023 Predicting Item Difficulty of a Knowing Numbers Test Using the

Inverse Partial Credit Model - Ong Yoke Mooi

KK_045 Using Rasch Model to Assess the Foreign Language Speaking

Anxiety Scale (FLSAS) among University Students in Salatiga -

Rizki Parahita Anandi

KK_060 Facilitator Training Needs in Malaysia Schools - Mohd Kashfi Mohd

Jailani

KK_050 Live Grading of Essay Questions Contributing to Computer

Adaptive Testing - Dr. Haniza Yon

5:30

1:30 - 3:00

3:30 - 5:30

10:15 - 12:15

Parallel Session 3

Keynote Speech - Prof. Cavanagh

Break

Chairperson: Prof Margaret Wu Chairperson: Dr. Bambang Sumintomo Chairperson:Dr. Haniza Yon

Track 1 (Education) Track 2 (Scale Development) Track 3 (others)

Arrival of guests

Bus will pickup participants from Klagan Regency

Bus will send participants back to Klagan Regency

Parallel Session 4

Break

Parallel Session 5

Track 1 (Education) Track 2 (Scale Development) Track 3 (others)

Chairperson:Dr. Haniza YonChairperson: Prof. NazlindaChairperson: Prof. Vincent Pang

Lunch (PROMS Board Meeting)

Day 2 - August 8th

Track 1 (Education) Track 2 (Scale Development) Track 3 (others)

Chairperson: Prof Margaret Wu Chairperson: Dr. Jeff Durand Chairperson: Prof. Trevor Bond

Page 4: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

4

Time

8:00

08:30 - 09:00

9:00 - 9:40

9:40 - 10:15

KK_048 The Effectiveness Of Teacher Training Lessons - Burhanuddin

Tola

KK_052 Rasch Model Application on Developing a Self-regulation Study

Instrument for Mathematics Education Students - Wardani Rahayu

KK_056 Comparison of holistic and analytic rating methods of a writing

task from the perspective of validity, reliability and practicality -

Keita Nakamura

KK_051 Development of Indonesia Science Literacy Test (ISLT)

Instruments to Improve Criteria Validity of National Exam - Rosita

Uli Sihombing

KK_053 Measuring Second Language Receptive Knowledge of Collocation

Among Graduate Learners in Public Universities Malaysia Using

Rasch Analysis - Lily Hanefarezan Asbulah

KK_063 Rasch-based Test Equating: An Application of Winsteps in

China

- Wu Jinyu

KK_061 Performance of Early Mathematics Achievement Test (UPAM)

over time: Applying Rasch Measurement Racking - Dr. Connie

Cassy Ompok

KK_067 Assesing Pedagogical Content Knowledge of the particle theory of

matter and Phasa Change in Pre-service Science Teacher - Maryati

KK_069 Modelling a Meaningful Hybrid eTraining for Diverse Learners

using Rasch and SEM - Rosseni Din

KK_070 Exploration of the psychometric properties of Eternal Love

Instrument(ELI) and validation of ELI Model: A Rasch Model

Approach - Akbariah Mohd Mahdzir

12:00 - 1:00

1:00 - 2:00

2:00 - 2:30

4:30

Track 3 (others)

Chairperson: Prof. Rob CavanaghChairperson: Prof. Trevor BondChairperson: Dr. Juliet Ling

Track 1 (Education) Track 2 (Scale Development)

Day 3 - August 9th

Bus will pickup participants from Klagan Regency

Bus will send participants back to Klagan Regency

Keynote Speech - Dr. Zali

Arrival of guests

Closing speech - Host chairman

Closing speech - PROMS chairman

Next Year PROMS Committee

2:30 - 3:30

Lunch

Break

Panel Discussion -auditorium

Closing Ceremony

10:15 - 12:00

Symposium on Publishing in Conference Proceedings (Prof. Cavanagh, Prof. Bond & Prof Durand) - Auditorium

Parallel Session 5

Page 5: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

5

Keynote Speakers

Professor Margaret Wu was also an associate professor at the University of Melbourne. Margaret's main interests are in the statistical modeling of assessment data and the development of online teaching and learning tools.

Title: Rater effects and IRT models In this talk, Dr Wu will present a few IRT-based analyses of rater effects including the estimation of rater severity and rater discrimination. Rater severity refers to the differences between raters in terms of their tendencies to award higher or lower scores. Rater discrimination refers to the extent to which raters use the score range to separate students on the ability scale. Rating scale model, the partial credit model and the generalized partial credit model are used to analyse rater effects. A discussion on the interpretations of some measures of rater effect is provided. It is noted that a rater who shows large discrepancies from other raters may in fact be the best rater.

She has worked as a psychometrician at the Australian Council for Educational Research for more than ten years. She is also a co-author of Item Analysis software Conquest, which has been used extensively within Australia and internationally.

Professor Robert Frederick Cavanagh received his PhD from Curtin University Western Australia in the year 1997. He is a member of numerous Professional associations and is currently the Chair of the Board of Management of the Pacific Rim Objective Measurement Society (PROMS).

Title: Invariant measurement and metrological networks in amodern measurement Test score research tradition measurement theories (e.g. Classical Test Theory and True Score Theory), share common assumptions with a positivist philosophical orientation. This commonality renders test-score theories susceptible to critique similar to that levelled at positivism, the anti-positivist critique and post-modernism in general. An amodern theory of measurement needs to provide a constructive response to the anti-positivist critique, to move beyond positivism and the test score research tradition. The four defining characteristics of amodern measurement are: advocating measurement to enable societal and environmental renewal; the philosophical genre of hermeneutical phenomenology; application of scaling research tradition theories; and inclusion of constructs from related disciplines including metrology and network theory. This presentation builds on previous work explicating the first two characteristics of amodern measurement by examining aspects of the second two characteristics. In particular: the consonance between invariant measurement and amodern measurement theory; and the application of network theory and modeling in amodern measurement theory.

He is currently active as a reviewer in a peer-refereed conferences and journal since the year 1999. He is also active in writing book chapters and numerous articles in renowned journals, has supervised PhD candidates since 2000, and has been PhD thesis examiner at several universities since 2004.

Dr Mohd Zali Mohd Nor is an I.T. Manager in a shipping services company. He received his B.Sc. in Mathematics from The University of Michigan, Ann Arbor, MI, USA, in 1988, Master of Management in I.T. from Universiti Putra Malaysia in 2005, and PhD in Management Information System in 2012.

Title: Rasch in Malaysia – A Brief History, Challenges, and a Peek into Future We looked at the progress of Rasch measurement in Malaysia during Pre-2008, 2008-2015 and Post-2015. After 2015, we do not progress much. Majority of Rasch papers delved on verifying quality of items. We do not use Rasch to it’s full benefits as a measurement model. We have competent trainers but we do not have those who really understood Rasch model and it’s technicalities to teach advance levels. A peak into future – what do we need to progress like other countries such as Japan, Singapore, and Australia?.

As Vice-President of myRasch, he is currently active in trainings and consultations on Rasch analysis and has provided assistance to postgraduate students from various local universities.

Page 6: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

6

List of Papers Paper No KK_001

Paper Title On the measure of attitude towards science: A scientific model

Email Address [email protected]

1st Author Liu Huang

Subsequent authors

Fan HUANG, Pey-Tee Emily OON

1. Aims/ Objectives of study:

The present study re-examines the incorporation of the three mostly used constructs in the measure of students attitude towards science (SAS) by Rasch model that requiring of invariance, and consistent response category functioning.

2. Sample: At least one school from each district in Guangzhou were randomly selected and invited to participate in the study. A total of eight secondary schools from different district, which made up 10% of all schools in Guangzhou, agreed to participate in the study. Two classes of students from each grade from each participating school completed the questionnaire. A total of 1133 7th to 11th graders who study science completed the questionnaire

3. Method: The survey data were subjected to Rasch analysis using WINSTEPS software (Version 3.81) to examine whether there is fit to the scientific model. A principal component analysis of residuals was performed to explore the invariance of data. A criteria of Rasch were used to verify the effectiveness of each of the 5-point response category. Residual loadings plots were scrutinized to examine whether the three constructs were cohesive in measuring the SAS.

4. Results: Though all items stayed within the acceptable fit, the variance explained by Rasch measures was only 30.0% and the first three unexplained variances were 10.0, 2.0 and 1.6 in the principal component analysis (PCA) of residuals. The result indicates the existence of secondary dimension (noises) than SAS. We further examined which items contributed to the noises through the exploration of the residual loadings plot. The figure shows Item A to Item N, which were all negatively worded items, had a factor loading greater than .40,. On the other hand, Item a to Item k, associated to all positively worded items, had factor loading <.-60. The dimensionality of the data improved through the remove of all negative items. The variance explained by the Rasch measures then increased from 30.0% to 38.2%, and the first three unexplained variance decreased to 1.9, 1.7 and 1.5. The strongest contrast was evidenced between SC and IS. The disattenuated correlation

between IS and SC is .66, which revealed these two component are not highly correlated in measuring SAS (Linacre, 2014). In contrary, SE and SC reported high correlation of .98. This strongly suggested SE and SC are consistent in measuring students’ attitude towards science.

5. Conclusions: The analysis of PCA of residuals suggested that the negatively worded items in ASATSCS may not measure the same construct of SAS as those of positively worded items. All negatively and positively framed items clustered separately. The measure explained by Rasch increased significantly and the noises decreased sharply in the data after removing the negatively worded items. Briner and Smith (1999) stated that the negatively worded items do not measure the same underlying construct as positively worded, and the two kind of items in a same calibration often caused highly incompatible situation. We recommended the removal of negatively worded items in the measure of SAS. The present study found that SC (confidence in science), and IS (importance of science-related activities) do not correlate well in the measure of SAS. Wang and Berlin (2010, p. 2418), quoted from Dhindsa and Chung (2003), defined IS as the extent to which a student thinks their science class to be an important and worthwhile class. It is related to their science class experiences. We argued that teaching methods attribute to IS, is an extrinsic factor that affect SAS. On the other hand, Dhindsa and Chung (2003) defined SC as the extent to which student is confident and successful doing science (p. 911). Confidence are highly relevant to motivational belief which measured by students’ belief of ability and behavior in science class (Bryan, Glynn, & Kittleson. 2011, p. 1050; Simpkins, Price, & Garcia, 2015, p. 1387). It is an intrinsic aspect of motivation. These two aspects, the intrinsic and extrinsic factors, had different conceptions and effect which might lead to inconsistencies in the measure of SAS but has often gone unaware in the precursory SAS studies. The 2015 PISA study defined SE, enjoyment of science, as ‘A measure of how much students like learning about science’. Students’ enjoyment in science has often referred as intrinsic motivation (Ryan & Deci, 2000) which is similar to ‘science confidence’ (IS). This explained, theoretically, why SE and IS correlated highly with each other.

Page 7: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

7

Paper No KK_002

Paper Title Internet Banking Service Quality Measurement: A Scale Development for Malaysian Banks

Email Address [email protected]

1st Author Mahgoub Elradi Ahmed siddig

Affiliation UPM

Subsequent authors

Prof. Dr. Rusli Abdullah, Assoc. Prof. Marzanah A. Jabar, Dr. Yusmadi Yah Jusoh

1. Aims/ Objectives of study:

The study have two objectives. The first objective is to identify the service quality dimensions of Internet banking websites based on both qualitative and quantitative research methods. The second objective is to develop validated scale to measure these dimensions.

2. Sample: the sample size of pilot study is 42 respondents

3. Method: 1. Systematic Literature Review for Internet banking service quality measurement. 2- Proposed research model 3. Interviews with both academics and bank professional. 4. Scale development (Face and Content validity). 5. Scale validation through pilot test using Rasch model software (Items quality, persons quality, scale category confirmation)

4. Results: The analysis of the results as as following table: 1. Person Statistics: Spread: Spread of (5.5 – 0.2) = 5.3logit is poor. Person distribution has a much higher spread compared to Item spread. Reliability: reliability = 0.95 is excellent and Cronbach Alpha = 0.94 is very good. Distribution: Person distribution is normal with partially positively skewed distribution. The distribution is also platykurtic (flat). 2. Item Statistics Spread: Spread of (1.6 – (-1.5)) = 3.1logit is fair Reliability: reliability = 0.85 is good. Distribution: Item distribution is normal with slightly negatively skewed with leptokurtic distribution 3. Category Functionality: Looks good.

4. Principle Components Analysis (PCA): looks OK, even though eigenvalue of the 1st contrast is about strength of 6 items, larger than 3. There is no indication of distinctions between high contrast items (> 0.6) and low contrast items (< -0.6). 5. Items measure: no difference in items measures between original and deleted persons data.

5. Conclusions:

Based on the results above some items were deleted, rewarded in the final instrument.

Page 8: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

8

Paper No KK_003

Paper Title Face Validity Test on Validity and Reliability of ICT Procurement Officer Competency Measurement Instrument by Using Rasch Model

Email Address [email protected]

1st Author Azran Ahmad

Affiliation

Subsequent authors

Prof. Datin Dr. Noor Habibah Haji Arshad, Dr. Syaripah Ruzaini Syed Aris

1. Aims/ Objectives of study:

The main purpose of this paper is to determine the validity and reliability of the constructs that have been identified for the development of a new instrument in measuring the competency of ICT Procurement Officer (PO) who appointed as member of technical evaluation committee in Public Sector’s ICT projects.

2. Sample: This test has involved a total of 45 experts drawn from the Public Sector ICT personnel with at least of 3 years of ICT experience as a member of the technical evaluation committee for ICT projects.

3. Method: The data analysis was implemented using Rasch Model for assessing the validity and reliability of the items.

4. Results: The overall Face Validity Level of Agreement was at 95.56%. Referring to the Rasch Statistic Summary, the item separation was at 3.26 and Cronbach Alpha was at 0.97. The items polarity indicated by point correlation measure (PTMEA CORR) for 49 items were all positive, recorded between 0.30 and 0.70. While, outfit mean square (MNSQ) values, range between 0.51 and 2.20, was considered in determining each of the construct validity and reliability.

5. Conclusions: Based on the MNSQ accepted values range suggested by Bond and Fox (2015), between 0.60 and 1.40, out for 49 items, 40 items were sustained and another 3 items suggested for purification, while the rest of the 6 items were considered to be dropped in the development of the new expected instrument.

Paper No KK_004

Paper Title Mathematics Item Quality: An Illustrative Example using Rasch Measurement Model

Email Address

[email protected]

1st Author Ling Mei-Teng

Affiliation UMS

Subsequent authors

Lei-Mee Thien (USM), Mei-Yean, Ong (USM)

1. Aims/ Objectives of study:

This study aims to present a step-by-step data analysis procedure to validate a set of 20 TIMSS 2007 and 2011 mathematics released items for validate purposes.

2. Sample: A total of 113 grade eight students were selected from six secondary schools in Sabah.

3. Method:

4. Results: Findings revealed that 19 items have shown the acceptable Rasch analysis properties Item 16 was found misfit and need to be revised. The mean of the item and person are less than 0.5, indicating the test was on-target. However, findings revealed the ratio of item difficulty of low, medium, and high was 1:2:1 respectively and different from the test specification proposed by the researchers which is 3:4:3.

5. Conclusions:

This study has contributed to the process of producing a set of validated mathematics items using Rasch model particularly for the school teachers. Implications and limitation of the study were presented.

Page 9: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

9

Paper No KK_005

Paper Title Rasch person fit statistics associated with the weighted degree indicators of Social Network Analysis

Email Address

[email protected]

1st Author Tsair-Wei Chien

Affiliation Chi-Mei Medical Center, Taiwan

1. Aims/ Objectives of study:

The purpose of latent trait and latent class analysis is to partition the sample of persons into a minimum number of homogeneous classes that is meaningful to the study we conducted. How to incorporate Rasch residual data with social network analysis(SNA) using graphical representations for detecting model misfit persons is helpful and interesting.

2. Sample: A simple polytomous dataset (Linacre, 1997) with 26 persons and 20 items was illustrated. After transforming the 2-mode (person in rows and item in columns) Rasch standardized residual scores into a one-mode (both person in rows and columns) metric to show person latent classes accordingly.

3. Method: Applying Rasch model to calculate residual correlation coefficients of any paired persons as to form a one-mode metric, we applied SNA Gephi software to draw a plot with person weighted degrees showing two distinguished latent classes(i.e., fit and misfit groups) of interest. The correlation coefficients between weighted degree and Rasch indices of Outfit and Infit Mean square errors as well as the person measure correlation to the domain(ie., PT-MEASURE CORR. In Winsteps) were reported.

4. Results: We can see that the classes according to the Rasch standardized residual patterns were easily and separately displayed using SNA Gephi software along with the momentum of Rasch fit statistics. The coefficients of the weighted degree correlated with indices are 0.51(Outfit),0.48(Infit), and -0.62(PT-MEASURE CORR), respectively.

5. Conclusions:

Rasch standardized residual scores yielded by Winsteps software or other counterparts were recommended to apply SNA for obtaining homogeneous classes and further explaining the data in terms of how the persons in the different classes responded differently to the items.

Paper No KK_006

Paper Title Application of Multi-dimensional Computerized Adaptive Test on Clinical Dementia Rating Scale using Computer-aided Technique

Email Address

[email protected]

1st Author Ting-En Hui

Affiliation Chi-Mei Medical Center, Tainan, Taiwan

Subsequent authors

Tsair-Wei Chien, Chi-Mei Medical Center, Tainan, Taiwan

1. Aims/ Objectives of study:

With the increasingly rapid grow in elderly population, aged 65 and older, comprised more than 11.8% of the nation's citizen which was defined as a the super-aged society. However, the leading factor influencing the elderly is the dementia. How to exactly examine and diagnose subjects using a specialized multidimensional computer adaptive testing (MCAT) tool is still unknown. Thus, we aim to develop a website that can help parents with their own computers, tablets, or smart phones for online screening and prediction of dementia responding Clinical Dementia Rating (CDR) Scale.

2. Sample: The CDR scale was applied to 366 outpatients in a hospital of southern part Taiwan.

3. Method: We (1)used multi-dimensional computer adaptive test(MCAT) with parameters for items across six dimensions, (2) simulated responses to compare the efficiency and precision of MCAT and NAT(non-adaptive test). The number of items saved and the cutoff points determined for the tool were determined.

4. Results: MCAT yielded significantly more precise measurements and was significantly more efficient than was NAT: it yielded a 20.19%(=(53-42.3)/53) saving in item length when measurement differences less than 5% were allowed. Person-measure correlation coefficients were highly consistent among the five domains. The cutoff points for the overall measures were -0.7 and 0.7 logits, which was equivalent to 33 and 67 in percentile scores. Significantly fewer items were answered on MCAT than on NAT without compromising MCAT’s precision.

5. Conclusions:

Developing a website to help parents with their own computers, tablets, or smart phones for online screening and prediction of dementia in elders is useful and not difficult.

Page 10: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

10

Paper No KK_007

Paper Title Using social network analysis to report Rasch papers’ keyword development and association across years

Email Address

[email protected]

1st Author Wei-Ru Jyun

Affiliation Chi-Mei Medical Center, Taiwan

Subsequent authors

Tsair-Wei Chien, Chi-Mei Medical Center, Tainan, Taiwan

1. Aims/ Objectives of study:

To compare the keywords associated with Rasch related papers and analysis their development in recent and past years.

2. Sample:

3. Method: Selecting 3,100 abstracts and their corresponding keywords downloaded from US National Library of Medicine National Institutes of Health (i.e., atpubmed.com) between 1952 to 2017(April), we were to explore the keyword development and association across years in additional to analyze the most outstanding authors who published health-related articles and their collaboration pattern. We used social network analysis(SNA) to explore the relations of the keywords and authors in journals.

4. Results: Besides the most frequent keywords are Rasch model, Rasch analysis, and item response theory, the strongest association of two authors and keywords are reported in this study. The visual representations regarding the development and change across years are present also.

5. Conclusions

The Rasch related papers related to health affairs are worth studying and reporting in the wonderful Rasch PROMS conference.

Paper No KK_008

Paper Title Multidimensional computerized adaptive testing for toddlers: a developmental screening tool

Email Address

[email protected]

1st Author Ying-Hsien Chien, Jin San Chronic Care Hospital & Nursing Home, Tainan, Taiwan

Affiliation

Subsequent authors

Tsair-Wei Chien, Chi-Mei Medical Center, Taiwan

1. Aims/ Objectives of study:

To investigate using multidimensional computer adaptive testing (MCAT) tool combined with Multidimensional Screening in Child Development (MuSiC) for toddlers' parents.

2. Sample: We had retrieved 75-item parameters from the literature regarding MuSiC at https://www.ncbi.nlm.nih.gov/pubmed/25127503

3. Method: After we had retrieved 75-item parameters from the MuSiC literature item bank for 1- to 3-year-olds, we simulated 1,000 person measures from a normal standard distribution to compare the efficiency and precision of MCAT and NAT (Non-Adaptive Testing) in five domains: cognitive skills, language skills, gross motor skills, fine motor skills, and socio adaptive skills. The number of items saved and the cutoff points determined for the tool were determined.

4. Results: MCAT yielded significantly more precise measurements and was significantly more efficient than was NAT: it yielded a 46.67% (= 75-40)/75) saving in item length when measurement differences less than 5% were allowed. Person-measure correlation coefficients were highly consistent among the five domains. Significantly fewer items were answered on MCAT than on NAT without compromising MCAT’s precision.

5. Conclusions:

Developing a website to help parents with their own computers, tablets, or smart phones for online screening and prediction of developmental delays in toddlers is useful and not difficult.

Page 11: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

11

Paper No KK_009

Paper Title Examining the Measurement Properties of Students' Perceptions of Assessment Scale

Email Address

[email protected]

1st Author Wilham M. Hailaya, Ph.D.

Affiliation College of Education, Mindanao State University at Tawi-Tawi Sanga-Sanga, Bongao, Tawi-Tawi, Philippines

Subsequent authors

1. Aims/ Objectives of study:

The aim of this study was to develop an instrument called the Students' Perceptions of Assessment Scale that can be utilized to measure students' perceptions of assessment, particularly on test and assignment as commonly used in the Tawi-Tawi context. Specifically, the study sought to establish the utility of the instrument by investigating its measurement properties at the macro and micro levels. Developing the said instrument was deemed vital as it can help provide important information about the subjective qualities of assessment tasks, which can also be a basis for assessment practices to be properly tailored to meet students' interests and improve their learning.

2. Sample: The samples were purposely selected as some schools or locations were difficult to access. Moreover, specific grades were targeted to ensure that the students involved in the study did experience doing tests and assignments. In total, 2,077 students from Grade Six, Second Year and Fourth Year high school classes participated in the study.

3. Method: The instrument was first examined at the macro level using the confirmatory factor analysis to ascertain the two predetermined constructs namely perceptions of test and perceptions of assignment. To carry out the confirmatory factor analysis, LISREL 8.80 was used. After which, the instrument was investigated at the micro level using the Rasch model (Rating Scale Model) to further establish the constructs and the characteristics of the items. To carry out the analysis at this level, ConQuest 2.0 was employed. The results of the two analyses were used to judge the acceptability of the instrument.

4. Results: The results indicated that both one-factor and two-factor models were appropriate for the instrument, though one-factor model was preferred due to its parsimony. Moreover, 11 items appeared to tap perceptions of test and seven (7) items

appeared to reflect perceptions of assignment. Furthermore, the option categories appeared to work well.

5. Conclusions:

The scale, albeit far from being perfect, has the utility in measuring students' perceptions of assessment on tests and assignments. Moreover, the scale can be a starting point for further study on the instrument and has implications for instrument development, educational assessment research, policy and practice.

Page 12: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

12

Paper No KK_010

Paper Title Development and validation of the students’ ability questionnaire on science process skills

Email Address [email protected]

1st Author Ellyza Karim (PhD candidate)

Affiliation

Subsequent authors

Dr Jamil Ahmad & Prof Kamisah Osman

1. Aims/ Objectives of study:

As the instrument was newly developed, pilot study was conducted to determine the empirical proof on validity and reliability of the questionnaire.

2. Sample: 340 respondents of 2016 Malaysian Primary School Evaluation Test (UPSR) leavers. Those respondents, aged 13 years old were the first cohort of the recently implemented science curriculum syllabus for primary school.

3. Method: This study reviews the assessment on the ability level of science process skills using 5-points Likert scale questionnaire ranging from unable to able. A total of 68 items were managed to be developed in the questionnaire applying verified indicators based on the literature reviews. Later, the items-indicators were justified by experts consensus operated by Fuzzy Delphi Method. Respondents were given one hour to complete the instrument. Finally, Rasch Analysis for two-facet Model version 3.73 was employed to analyse the data.

4. Results: Overall, the Cronbach Alpha person reliability was found at 0.96 while item reliability is 0.99. The range of Point measure correlation (PTMEA Corr) are positive between 0.33 to 0.71 for all items, which showed that all items were measuring what are supposed to be measured in the science process skills construct. All items accepted as the outfit mean square (MNSQ) have range between 0.67 and 1.50, indicating a good measure of latent variables for item fit. Via item map, the findings showed that most students were unable to design scientific steps on their own within experimenting skill. Meanwhile, for using space and time relationship skill, determination for the object position with time is the most able item for them to do.

5. Conclusions: The findings provide a more accurate insight on the construct validity and reliability of the questionnaire to measure students’ ability on science process skills.

Paper No KK_011

Paper Title We have equal intervals; now we need invariance: The next important step in Rasch measurement

Email Address [email protected]

1st Author Prof Trevor G BOND

Affiliation

Subsequent authors

1. Aims/ Objectives of study:

The distinctive attribute of a measurement system is the requirement for an arbitrary unit of differences that can be iterated between successive measures.Instead of focusing on constructing measures of the human condition, psychologists and others in the human sciences have focused on applying sophisticated statistical procedures to their data. In the human sciences, invariance of item and person measures remains the exception rather than the rule. Interpretations of results from many tests of common human abilities must be made exactly in terms of which sample was used to norm the test, and candidates' results for those tests depend on which test was actually used.

2. Sample: n/a

3. Method: n/a

4. Results: This presentation will demonstrate simple tests of invariance and show how invariance and DIF contribute to important understandings in the human sciences.

5. Conclusions: An important goal of early research should be the establishment of item difficulty values for important testing / data collection devices such that those values are sufficiently invariant - for their intended purposes.

Page 13: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

13

Paper No KK_012

Paper Title Application of Rasch Model in Islamic Moral Value Scale for Islamic Education Teachers (INSPI) in Malaysia

Email Address [email protected]

1st Author Salbiah bt Mohamed Salleh @ Salleh

Affiliation

Subsequent authors

(1) Jamil bin Ahmad (2) Mohd Aderi bin Che Noh (3) Anis Jauharah bt Abdul Kadir

1. Aims/ Objectives of study:

This study aims to validate the Islamic Moral Value Scale for Islamic Education Teacher (INSPI) in Malaysia using Rasch Measurement Model. Keywords : Rasch Measurement Model, Islamic Moral Value, Islamic Education Teacher

2. Sample: Two hundred Islamic Education Teachers in primary and secondary schools participated in this study

3. Method: This study employed a quantitative approach of data collection and analysis. A survey was used to gather information on the Islamic moral value practised by Islamic Education Teachers in primary and secondary schools in Selangor, Malaysia. The data were analysed using Winstep 3.80 for investigating the functioning and rating scale categories, reliability and separation index, unidimensionality, item polarity, goodness of fit and item difficulty level of the items.

4. Results: Firstly, the original five-rating scale does function effectively; Secondly, the reliability for item and person are very high and the separation are good that are greater than two; Thirdly, the Rasch Model proved that INSPI is a unidimensional scale. There are 3 items deleted due to misfit. Lastly, all the items in this scale are quite easy for the respondents and they performed well doing almost all the items measured in the scale.

5. Conclusions: This study produced a new Rasch measurement for a moral development. It provides new insight into the measurement in religious study especially in Islamic religion.

Paper No KK_013

Paper Title Rasch-derived Measure for Assessing Student Competency in University Introductory Computer Programming (CS1)

Email Address [email protected]

1st Author Leela Waheed

Affiliation

Subsequent authors

Rob Cavanagh

1. Aims/ Objectives of study:

The purpose of this paper is to report the results of a project to develop and test a linear measure of university student performance in the first course of computer programming (CS1).

2. Sample: The sample comprised 85 students (25 [Maldives National University (MNU)], 31 [Asia Pacific University of Malaysia (APU) and 29 [Villa College, Maldives]). The students had completed their CS1 course and were in the second semester of the first year of university study.

3. Method: The validity of the CS1measure was investigated with the theoretical frame expounded by Messick. The aspects of the frame are the content aspect, the substantive aspect, the structural aspect, the generalisability aspect, the external aspect, the consequential aspect and aspect of interpretability added by Smith (2007). RUMM2030 was used to generate statistics and displays to exempligy the Messick aspects.

4. Results: The initial analysis of the data set with RUM2030 demonstrated excellent Person Separation Index (PSI) with no evidence of Differential Item Functioning (DIF), misfit of the items or persons. However, there was some disordering of thresholds. Hence, Question 1D was rescored dichotomously, and the middle two categories of Question 3D and 5D were collapsed. Principal Components Analysis of residuals showed no significant structure in the rotated component matric supporting the assumption of local independence, and unidimensionality.

5. Conclusions: There was evidence that the CS1 measure demonstrated reliability and validity in measuring student competence in fundamental CS1 concepts. This study also demonstrated application of the Rasch model both as a powerful approach for instrument construction and the provision of evidence to argue for validity.

Page 14: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

14

Paper No KK_014

Paper Title Disciplinary Biases of a Student Evaluation of Teaching Survey in Higher Education

Email Address [email protected]

1st Author Billy Wai Kei Chan

Affiliation University of Macau

Subsequent authors

Fan Huang and Emily Pey Tee Oon

1. Aims/ Objectives of study:

The present study aims to identify rating biases across academic disciplines areas that could possibly plague the UMAC General Education Course Survey (GECS)’s scores. In particular, we examined whether the four disciplines, namely Language and Communication, Science and Information Technology, Society and culture, and Self–development, garnered uneven scores that signified biases in student ratings through the invariance assessment of Rasch measurement model. Keywords: disciplinary biases, SET, higher education

2. Sample: For the present study, the data to the GECS from 45,361 students who enrolled in the GE programme from 2011 to 2015 (a full cycle of 8 semesters) from all disciplines were included for analysis.

3. Method: Instrument The GECS consisted of five items of rating scale with a six–point response categories (1: Strongly Disagree; 2: Disagree; 3: Slightly Disagree; 4: Slightly Agree; 5: Agree; 6: Strongly Agree). The items are: 1. This course helped you participate actively in classroom activities. 2. This course helped manage your own independent learning in this subject area in the future. 3. Did this course help you understand its application in everyday life situations? 4. This course helped you think critically about the course topics. 5. Did this course help you develop your communication skills (e.g. reading, writing, speaking and other forms of communication)? Data Analyses Raw scores of GECS were imported into Rasch’s statistical software, namely, Winsteps version 3.81.0 developed by Mike Linacre (Linacre, 2014), for

construction of interval–scale data. The data was then examined for its quality according to the expectation of Rasch’s psychometric model – the fit of data to Rasch model. Fit statistics, indicated by Meansquare (MnSq) infit and outfit, was used to examine the fit. Acceptable values range between .60 and 1.40 (Wright & Linacre, 1996; Bond & Fox, 2015). Data that fit the Rasch model are of unidimensional (Bond & Fox, 2015). Scores of data that found to misfit the Rasch expectation are not interpretable as it might have measured more than one latent trait and hence distorts the measurement of latent trait (Bond & Fox, 2015). The latent trait for the present study is teaching and learning quality of the GE courses. Next, analysis of Differential Item Functioning (DIF) was conducted to compare the pattern of items along the latent trait on a common scale as a function of difficulty/agreeability across the four disciplines. Disordered items spread along the scale across the four disciplines signified potential bias. This means, if item difficulty/agreeability estimates for each item do not remain identical (lack of invariance) across the four disciplines, the items might have been interpreted differently by students from different disciplines – some items could have ‘favorably’ rated by certain group of students with reasons not relating the teaching and learning quality. Lack invariance is signaled by a DIF contrast of greater than .50 logit (Bond & Fox, 2015; Linacre, 2014).

4. Results: All items reported acceptable MnSq infit (.76–1.18) and outfit (.71–1.13). The results indicated that scores for the GECS met the expectation of Rasch model to be unidimensional (measure only the quality of teaching and learning of the GE courses) and hence are interpretable (Bond & Fox, 2015). Students’ overall responses on the five items The extent of agreement is indicated by ‘Measure’ (Rasch estimates) – a lesser positive value indicates greater extent of agreement, in contrast, a greater positive value indicates lesser extent of agreement. Item 5 (Did this course help you develop your communication skills? (e.g., reading, writing, speaking and other forms of communication) appeared to be the most difficult item to be agreed with (47.52 logit). On the other hand, Item 3 (Did this course help you understand its application in everyday life situations?) as the easiest item (47.36 logit). That is, relatively fewer students agreed that GE courses enhance the development of their communication skills. In contrast, many agreed that the content of the courses is relevant as the knowledge they learn from the GE courses can be applied in daily situations. In addition, Item 1 (This course helped you participate actively in classroom activities, 47.48 logit) are more difficult to be agreed with as compared to Items 2 (This course helped manage your own

Page 15: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

15

independent learning in this subject area in the future, 47.46 logit) and 3 (Did this course help you understand its application in everyday life situations, 47.36 logit). The results seem to suggest that, overall, students agreed more to items probing the value of GE courses on daily relevance, interaction enhancement and communication development. Disciplinary differences in ratings The ratings of students from each discipline varied across items. Some items appeared to be more difficult for certain group of students but is easier for the other to be agreed with. Students from Science and Information Technology, Society and Culture, and Self Development disciplines found Item 5 to be most challenging to be agreed with but Items 3, 4, and 1 as the easiest items to be agreed with, respectively. In contrast, students from Language and Communication discipline found Item 5 to be easiest to be agreed with and Item 4 as most difficult to be agreed with. Item 5 (Did this course help you develop your communication skills (e.g. reading, writing, speaking and other forms of communication?) is reported to be least agreed with for the entire sample, as discussed above. Indeed, this item is very difficult for students from Science and Information Technology, Society and Culture, and Self–Development to be agreed with (p < .00) but those from Language and Communication discipline agreed to it unequivocally and statistically (p < .00). Item 1 (This course helped you participate actively in classroom activities), which is on enhancement of class participation, appeared to be second most difficult item to be agreed with for the entire sample. However, it is the easiest item for students from Self Development discipline to be agreed with, statistically easier than all their counterparts (p < .00). Item 3 (Did this course help you understand its application in everyday life situations), which is about the application of content knowledge, is the easiest item to be agreed with for the entire sample. This item, however, appeared to be statistically easiest item to be agreed with for the students from Science and Information Technology and is more difficult for students from Language and Communication to be agreed with, with a reported statistically significant

difference (p < .00).

5. Conclusions: The present study concurred with many that reported that student ratings vary across academic disciplines that often to be called as ‘bias in ratings’ (e.g., Benton & Cashin, 2012; Kember & Leung, 2011; Royal & Stockdale, 2015; Sixbury & Cashin, 1995; Zahn & Schramm, 1992). However, the variation in ratings seemed to be reasonable. Students from the soft courses rated more favorably items relating to class participation and communication. On the other hand, students from hard sciences rated more favorably items with regard to content application. The results prompted us to suggest that the variations in ratings are the consequence of pedagogical difference: soft sciences yielded better rating as these courses tend to be more students–centered instructionally as compared to hard sciences where the instructional approach to be teacher–centered and rigid. A point to be noted is that the afore–mentioned result is at variance with some other studies (e.g., Kember & McNaught, 2007; Kember et al., 2006; Murray & Renauld, 1995). These studies concluded that students’ perception of good teaching is independent of academic discipline. The contradictory research results suggest that the variation in ratings is of pedagogical consequences but academic disciplines (Kember & Leung, 2011; Murray & Renauld, 1995). As the differences in ratings were found to be reasonable, institution is not recommended to compare the student ratings of math, science and information technology courses to the ratings of other courses, e.g., language and communication courses, for human resources decisions, such as tenure, salary, and promotion, because the evidence suggested that the relative low ratings may be determined by the academic discipline, but not the teaching effectiveness. However, if the low ratings are due to low teaching effectiveness of teachers.

Page 16: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

16

Paper No KK_015

Paper Title Gender DIF in Mathematics Items among Secondary Students in a Coeducational Learning Environment

Email Address [email protected]

1st Author S.Kanageswari Suppiah Shanmugam

1. Aims/ Objectives of study:

This study reports on preliminary findings of gender Differential Item Functioning (DIF) in a school culture, which is renowned for the ‘special’ teaching approaches and successful mathematics learning that produce impressive mathematics results. The main aim of this study is to identify mathematics items that function differently across gender groups in coeducational schools to study the relationship between gender and characteristics of mathematics items.

2. Sample: A total of 63 boys and 55 girls in form two were selected from a school for the preliminary study.

3. Method: The software WINSTEPS was used to conduct DIF analysis. Based on the Rasch model, items were flagged for DIF by using Mantel-Haenszel chi-square method with boys forming the reference group and girls forming the focal group. Some 12 computation and 12 word problem items from the grade eight TIMSS 1999 and TIMSS 2003 released mathematics items were arranged according to the mathematical hierarchy of easy to difficult as stipulated in the mathematics curriculum. Word problem items were distinguished as items that are set in real-world context.

4. Results: Findings revealed that two items were flagged as DIF, with one computation and one word problem item. The computation item exhibits moderate DIF, while the word problem item exhibits large DIF but both items tend to favour the boys. These DIF items assess the lower-order thinking skills in the cognitive domain of Knowing and are from the topics of decimal and percentage in the content domain of Number.

5. Conclusions: This initial exploration suggests that items which assess lower-order thinking skills tend to favour boy and challenge the gender stereotype of items assessing higher-order thinking skills favour boys. Since the items in this test have been arranged from easy to difficult, a possible explanation triggered from these findings is that the serial position of items in a test may be an item characteristic that need to be considered. This is because items that appear at the end of the test tend to be more difficult for girls as suggested by some studies.

Paper No KK_016

Paper Title Rasch Analysis Properties of Structural Items and Essay in Chemistry Test

Email Address [email protected]

1st Author Adeline Leong Suk Yee

Affiliation Universiti Malaysia Sabah

Subsequent authors

Mei-Teng Ling, Lay Yoon Fah; Universiti Malaysia Sabah

1. Aims/ Objectives of study:

The study aimed to ascertain the Rasch analysis properties of structural and essay items by Rasch Partial Credit Model (PCM).

2. Sample: The test was administered to a group of 76 Form Four students who took chemistry subject.

3. Method: The structural and essay questions were analysed using Rasch Partial Credit Model (PCAM) to ascertain the item fit the unidimensionality of the items to the construct. Structural items were analysed separately from the essay items because the number of students who answered the essay items were not the same based on the essay item they chose to answer. Analysis of fit helps detect discrepancies between the Rasch model expectation and the data collected. The difference between the ability of students and difficulty of an item between two raters are assessed by a cross plot. Instrument used as an example in this study was Chemistry Achievement Test (Paper 2), consists of part A structural (6 items), Part B and Part C self-selected essay question (choose one from two in each part), developed by researchers and a panel of an excellent teacher of chemistry and 2 experienced secondary school chemistry teachers.

4. Results: Overall, the Rasch analysis properties of the chemistry test are acceptable with two misfit items (with MnSq more than 1.5), and two items are too good to be true (with z-Std less than -2.0). MnSq more than 1.5 indicates that the item is unproductive for the construction of measurement, modifications are needed (Linacre & Wright, 2012) and a small number of the ‘too good to be true items’ do not degrade measurement (Bond & Fox 2015). The responses to three items (with negative PTMEA Corr.) contradict the direction of the latent variable.

5. Conclusions: The content experts suggested another group of samples is given the test and the item polarity should be checked again due to the fact that the test in this study was scored by high-performance students. The person ability rated by Rater 1 was substantially different to those rated by Rater 2 for essay 2 and essay 3. The third rater is suggested to resolve the disagreement between the first two raters.

Page 17: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

17

Paper No KK_017

Paper Title Psychometric Features of Psychosocial Safety Climate (PSC-12)

Email Address [email protected]

1st Author Rosnah ismail, Faculty of Medicine National University of Malaysia

Affiliation

Subsequent authors

Azmi Mohd Tamil, Noor Hassim Ismail,

1. Aims/ Objectives of study:

The aim of this article is to examine the psychometric features of Psychosocial Safety Climate (PSC-12). It is defined as shared perceptions of organizational policies, practices and procedures for the protection of worker psychological health and safety, main focal from management practices in a unit or organizational level.

2. Sample:

3. Method: A cross sectional study of 509 male employees from multi-worksites had completed self-administered questionnaires from September 2012 to May 2013. The study used PSC-12 which consisted of 12 items to measure four domains i.e. senior management support and commitment for stress prevention; management priority to psychological health and safety versus productivity goals; organizational communication and organizational participation and involvement. Rasch model technique was used to examine Cronbach Alpha value, person and item measure, item and person reliability, standard error of item and item fit before submitting the data for unidimensionality verification. The scale’s unidimensionality was considered violated if 1) raw variance explained by the measure is less than 40% 2) the unexplained variance in first contrast was more than 2 Eigenvalue or 15% and; 3) the scale displayed Differential Item Functioning (DIF) more than 0.50 logit for role at the workplace i.e. leader role vs. non leader role in job scope. Finally, rating scale validity was examined based on 6 criteria: 1) minimum number of at least 10 responses per category, 2) the category frequencies displayed regular distributions, 3) average measures increased monotonically across the rating scale, 4) advance of at least 1.0 logits between structure calibrations for five category rating scale, 5) distinct probability curve

graph on each response category, and 6) outfit MNSQ was less than 2.

4. Results: A total of 492 male employees, person fit data represented 33 companies of three major activities i.e. service, manufacturing and agricultural in Malaysia were analyzed. The median (IQR) age is 33 (27; 42) years old. Median (IQR) duration served in current organization was 7 years (3; 17). Majority of them had tertiary education (51.1%) and had no leading responsibilities in their job scope (61.4%). The PSC-12 had Cronbach Alpha of 0.93 with sufficient item range (0.97) and enough spread of respondent ability across the sample to answer the items (0.89). The standard error of the item is 0.14 logit which is acceptable for 5-Likert response scale. Generally the items were fit and unidimensional in nature. All items showed no bias to leader role at the workplace. All ratings scale had surpassed all required criteria and showed resemblance to prototypical Likert scale probability curve. Absent of “noise” to measurement was observed.

5. Conclusions: The PSC-12 is a psychometrically sound scale among sampled male employees in Malaysia. It is a valid scale to measure shared perception of employees about psychological health and safety protection disregard to their role at the workplace.

Page 18: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

18

Paper No KK_018

Paper Title Reliability Testing Instruments for Computer Programming Learning: Applying the Rasch model

Email Address [email protected]

1st Author Azliza Yacob

Affiliation TATi University College

Subsequent authors

Noraida Haji Ali University Malaysia Terengganu, Noor Suhana Sulaiman TATi University College, Nur Sukinah Aziz TATi University College

1. Aims/ Objectives of study:

Assessment and measurement in teaching and learning Computer Programming is one of the most important process. As it known as a difficult subject to learn, it mostly results in high dropout and failure rates. This study aims to highlight the process of reliability testing for the learning instrument development, to support student understanding.

2. Sample: A pilot study of 33 samples was carried out to test the reliability of the instrument.

3. Method: The Rasch model was used to test the reliability of the measurement for each item by converting the test result into ration type data.

4. Results: A gap was found between the most difficult items and the rest of the items. After deleting some items, the result indicates that the instrument has a high degree of reliability and suitable for the real data collection. In this case, the most difficult item may require further investigation since students are either unfamiliar with the item or it is confusing and misleading or maybe the question given is too hard.

5. Conclusions: To develop a learning instrument that can support students' understanding, the construction of the corresponding item should be emphasized. Because of the mentioned factor, lecturers should decide the suitability of the item to provide a supportive and effective learning environment.

Paper No KK_019

Paper Title Rasch Analysis of the Malaysian Teachers’ Responses to the Organizational Commitment Scale

Email Address [email protected]

1st Author Ahmad Zamri bin Khairani

Affiliation School of Educational Studies, USM

Subsequent authors

Aziah binti Ismail, School of Educational Studies, USM

1. Aims/ Objectives of study:

The main objective of this study is to examine the psychometric characteristics of a translated version of the Organizational Commitment Scale among Malaysian teachers.

2. Sample: Data were collected from 1021 school teachers (male = 275, female = 746) from three states. Their age mean was 38.85 years (SD = 8.35 years).

3. Method: The present study employs quantitative approach with a survey method. A 24-item Organizational Commitment Scale was used to gauge responses from the teachers. The responses were then used to provide evidence of the psychometric properties of the scale. This study employs WINSTEPS 3.63 to provide statistics and other relevant information from the Rasch Model analysis.

4. Results: Rating scale analysis showed that category 2 and category 3 of the ratings were not adequately different. A total of 6 items did not fit the model’s expectation, and thus, were dropped from further analysis. The scale demonstrated a high person reliability as well as high separation index. There were no items that demonstrated gender DIF. The ordering of items on the measured scale was satisfactory, and no threat to construct validity was reported.

5. Conclusions: Based on our analysis, the empirical evidence on the psychometric properties of the scale will provide important information in the future use of the scale, especially in relation with other constructs. This practical importance is essential since commitment is considered as important educational outcomes in Malaysia. Even though scale validation is an ongoing process, it is perhaps not too off the mark to say that the present research provides important foundation for validation studies across different setting.

Page 19: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

19

Paper No KK_020

Paper Title DEVELOPMENT AND VALIDATION OF THE PLAGIARISM TENDENCY AMONG MALAYSIAN POST-GRADUATE STUDENTS.

Email Address [email protected]

1st Author Anis Jauharah Abd. Kadir

Subsequent authors

Nur Riza Suradi, Mohd Salmi Md Noorani, Salbiah Md Salleh

1. Aims/ Objectives of study:

This exploratory study aimed to test the validity and reliability of an instrument that was developed to measure the tendency of plagiarism among post-graduate students in one of Malaysian university. Key words: Plagiarism; attitude; knowledge; act.

2. Sample: Simple random sample of 125 Masters and Doctoral post graduate students (40 males and 85 females).

3. Method: This study used a quantitative survey methodology and were analysed using Rasch model. One Malaysian university was used for this case study which indicate a single institutional culture. The instrument comprises of 41 items that measure three constructs of plagiarism tendency, i.e., Knowledge, Act and Attitude. The instrument was measured by using Winsteps program of version 3.73 in terms of reliability, item polarity, goodness of fit and unidimensionality.

4. Results: The research findings indicated that in terms of item polarity, the instrument was able to measure the tendency of plagiarism in the range of 0.01 to 0.71. Only one item has a negative correlation and has been eliminated from the instrument. The reliability for both person and item each are 0.69 and 0.98. In the misfit test indicated that no item were eliminated because the value of infit mean square were nicely in the range of 0.71 to 1.21 and the value of outfit mean square were at 0.70 to 1.77. Although one item has an outfit mean square more than 1.50, it was considered to be remained in the instrument because the infit mean square was still in a good range. The dimensionality of the instrument shows value of raw variance explained by measured was at 47.6% same as the modeled value with the unexplained variance of the first contrast was at 8.7%.

5. Conclusions: The study showed that Rasch modeling could help researchers to analyses their instrument into truly refined quality instrument and in a systematically way. It also indicated that the instrument has a potential to measure the tendency of plagiarism among post-graduate students with minor adjustment in the sample size and the item.

Paper No KK_021

Paper Title Predictors of self-assessment intention and practice among primary and secondary students in Hong Kong

Email Address [email protected]

1st Author Zi Yan

Affiliation

Subsequent authors

Gavin T, L Brown, The University of Auckland, New Zealand John C. K. Lee, The Education University of Hong Kong, Hong Kong

1. Aims/ Objectives of study:

This study aims to explore the predictors of students’ self-assessment intentions and practices in the Hong Kong context.

2. Sample: The target population of the study is Primary 4 to Secondary 3 students in Hong Kong. A survey was conducted on around 1,500 students.

3. Method: The Theory of Planned Behaviour (TPB) (Ajzen, 1991) was applied as a theoretical framework to construct the understanding of students’ self-assessment intentions and practices as well as the predictors. The analytical methods include Rasch analysis (Rasch, 1960), which was used to examine the psychometric properties of the scales and calibrate student measures on each of the latent traits, and path analysis, which was applied to investigate the relationships among the latent traits.

4. Results: The findings indicated that attitude, subjective norm, and self-efficacy were positive and significant predictors of self-assessment intention while psychological safety was a negative predictor. Attitude and self-efficacy were positive and significant predictors of self-assessment behavior, while psychological safety was a negative predictor of behavior.

5. Conclusions: This result indicated that generally TPB appeared as an appropriate theoretical framework in explaining students’ intentions and practices regarding self-assessment. Some non-TPB component (e.g., psychological safety) also played an important role in determining students’ self-assessment intentions and practices.

Page 20: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

20

Paper No KK_022

Paper Title The use of MCQ as Formative Assessment to Reflect Attainment of Desired Learning Outcome

Email Address [email protected]

1st Author Ximei Zhou

Affiliation

Subsequent authors

Pey-Tee Oon, William P. Fisher

1. Aims/ Objectives of study:

A vertically, horizontally and developmentally coherent assessment (NRC, 2006), moving away from the current unconnected and decontextualize assessment framework, infusing hope to project a more comprehensive picture of student learning. The present study illustrates an example on how formative assessment can be coherently developed and presented to reflect scientific understanding of students over time and on what areas of reinforcement needed for learning to progress in order to achieve the desired learning outcome. This paper sets to illustrate a developmental coherent formative assessment.

2. Sample: TIMSS 2015 instrument looking at physics knowledge of 8th grade students was used for this purpose. A total of 4155 students from Hong Kong participated in the study, of which 1974 (47.5%) were girls and 2181 (52.5%) were boys. Data to the 26 restricted items from the students were subjected for analyses. Only results to the 13 MCQ were retrieved for illustration.

3. Method: Quality of data was first examined. Next, scoring form for individual students was analyzed and modified showing the attainment of learning outcome with items calibrated on a measurement scale.

4. Results: Results indicating a good fit of data to Rasch model. The acceptable MNSQ infit and outfit statistics of all items are 0.83-1.39, which are all within the acceptable range of the Rasch model (0.50-1.50) (Bond & Fox, 2007; Cheng & Oon, 2016; Oon & Subramaniam, 2011). A scoring form for individual student (Linacre, 1997) that reflects the attainment of desired learning outcomes. Items were arranged from easiest to most difficult – Item S042182 as the easiest item with content difficulty -2.36 and Item S062044 as the most difficult with content difficulty 1.20; apart 3.56 logit. All students

were calibrated on the measurement scale based on the probability to answer the items correctly/incorrectly; individual ability is indicated by ‘individual measure’ in relative to overall class mean. The attainment of learning outcomes can be reflected from the correct and incorrect responses where incorrect responses are indicator of ‘yet to be attained’ and the correct responses indicated ‘already attained’ conceptual understanding. Teachers know which area of conceptual understanding need to be further reinforced according to items that answered correctly/incorrectly in order to help each student to fully achieved the attainment of desired learning outcome. These set an entry point for teachers on what to be reinforced next (Fisher, 2013).

5. Conclusions: The purpose of formative assessment is to improve learning outcomes. The present study provides an example for teachers on how to trace learning progression of students use of MCQ on what learning goals have and have not attained.

Page 21: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

21

Paper No KK_023

Paper Title Predicting Item Difficulty of a Knowing Numbers Test Using the Inverse Partial Credit Model

Email Address [email protected]

1st Author Ong Yoke Mooi

Affiliation IPG Kampus Ipoh

Subsequent authors

Lee Leh Hong (IPG Kampus Ilmu Khas), Maria Pampaka (The University of Manchester)

1. Aims/ Objectives of study:

The aim was to study to what extent teacher trainers are able to predict the difficulty of items in a Knowing Numbers test.

2. Sample: A Knowing Numbers test was administered to 19 teacher trainers to record their perception of the difficulty of 22 items on a four point Likert scale (very easy, easy, difficult, and very difficult). A dataset was drawn from 83 trainees’ score to the 22 items in the Knowing Numbers course final examination from two Teacher Training Institutes in Malaysia.

3. Method: The Rasch analysis was conducted with the Winsteps software to 1. construct a scale of teacher trainers perception of item difficulty in the Knowing Numbers test using the Inverse Partial Credit model 2. construct a scale of trainees’ actual item difficulty in the Knowing Numbers test using the Partial Credit model (Masters, 1982) 3. to compare teacher trainers’ perception of item difficulty with trainees’ actual item difficulty. Hadjidemetriou and Williams (2004) used the Inverse Partial Credit Model to reveal contours of teachers’ knowledge with respect to their students’ graphical knowledge. To analyse the dataset with the Inverse Partial Credit model, we transpose the dataset. The rows become columns and the columns become rows and we run this data using the Partial Credit Model. In other words, the person becomes the instrument to measure the item difficulty as perceived collectively by teacher trainers.

4. Results: The Pearson correlation between teacher trainers’ perception of item difficulty and trainees actual item difficulty was 0.69 (n =22, p = 0.0003). This shows a moderately strong correlation between the teacher trainers’ perception of item difficulty and the trainees’ actual item difficulty in the Knowing Numbers test. Further analysis shows that teacher trainers overestimated the item difficulty of 5 items and underestimated the item difficulty of 2 items.

5. Conclusions: Teacher trainers need to construct test items with varying item difficulties to discriminate the different range of trainees’ ability in a course. This study provides empirical evidence that teacher trainers do have the knowledge to estimate the difficulty of items by reading the items. This skill is vital for teacher trainers in developing test items that matched with trainees’ abilities. References Hadjidemetriou, C., & Williams, J. (2004). Using Rasch models to reveal contours of teachers' knowledge. Journal of Applied Measurement, 5(3), 243-257. Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174.

Page 22: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

22

Paper No KK_024

Paper Title Validation of Medical Statistics Exam Paper: Conventional method versus RASCH.

Email Address [email protected]

1st Author Azmi Mohd Tamil, Universiti Kebangsaan Malaysia

Affiliation

Subsequent authors

Mohd Zali Mohd Nor, MyRASCH

1. Aims/ Objectives of study:

Medicine always require the researchers to validate their tools of measurement. Yet our own tools in measuring the knowledge of our students are rarely validated. We do have the usual conventional measures such as difficulty index, discrimination index, reliability index & Standard Error of Measurement (SEM) automatically generated, but they were rarely referred to or understood by the examiners. The objective of this study is compare the conventional methods against RASCH, to see which method of validation is superior.

2. Sample: Sample consists of all students taking the medical statistics exam paper, a total of 22 postgraduate students.

3. Method: The answers on the OMR sheets were scanned and converted into a flat database text file. The text file was converted into Excel format and analysed using the conventional method and RASCH. The indexes for conventional method were also compared against the similar computer generated indexes. Questions with poor discrimination and poor difficulty indexes were identified using both approaches.

4. Results: The conventional method and RASCH identified similar questions with poor discrimination and poor difficulty indexes. But RASCH was also able to determine that the questions were too easy for the students, a clear item-person mismatch. RASCH was also able to determine that the students could be graded into three groups, indicating that there should only be 3 grades given.

5. Conclusions: RASCH is clearly superior than the conventional method in validating the exam paper. If we are able to create a work culture where lecturers always validate their exam questions, RASCH should be one of the tools that is utilised.

Paper No KK_025

Paper Title The Effects of Chronic Daily Fears on Students’ Concept of Self: Towards Identifying Students being Bullied

Email Address [email protected]

1st Author Rense Lange

Affiliation ISLA - Vila Nova de Gaia, Portugal

Subsequent authors

Cynthia Martínez-Garrido and Alexandre Ventura

1. Aims/ Objectives of study:

Students may experience considerable fear and stress in school settings, and based on Dweck’s (2006) notion of “mindset” we hypothesized that fear introduces qualitative changes in students’ self-concepts. Moreover, these changes were expected to lead to lower student performance on academic tests of reading and mathematics.

2. Sample: Hypotheses were tested on 3847 students from nine Iberoamerican countries (Bolivia, Chile, Colombia, Cuba, Ecuador, Panama, Peru, Spain, and Venezuela).

3. Method: The 3847 students completed Murillo’s (2007) adaptation of Marsh’ (1988) SDQ-I. No overall (average) raw score differences were found. Questionnaire data were then analyzed using the Rasch rating scale model using questions' model residuals as predictors of student fear levels. In addition, these students took two assessments in reading and mathematics each (a pre- and post-test).

4. Results: There are three classes of findings: Psychological Distress: As was anticipated, Rasch scaling indicated that the information-content of High-Fear students’ ratings was more localized across the latent dimension than was that of Low-Fear students, and their ratings also showed less cognitive variety. Predicting Fear: The resulting measurement distortions were captured via logistic regression over the ratings’ residuals. Using training and validation samples (with respectively 60 and 40% of all cases), the changes in self-image were sufficiently strong to predict students’ fear levels and their gender based on the distortions in their self-image.

Page 23: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

23

Academic Performance: Consistent with the fear effects, we found that fearful students attempted fewer items than did students without such fears, and this was the case across all levels of proficiency. As a result, low fear students performed much better on reading and mathematics than did high fear students - the overall effect being about 0.75 Logits.

5. Conclusions: Fear in school changes students' self-image to the point where these distortions can predict their levels of fear - thus suggesting that it is possible to design early warning systems to identify students in need of attention and protection. We see the present findings as a first step towards implementing an online warning and detection system for signs of bullying and related issues among students.

Paper No KK_026

Paper Title DEVELOPMENT OF KKN-PPM PERFORMANCE INSTRUMENTS TO SUPPORT COMPREHENSION, SOCIAL SKILLS, AND DISCIPLINE STUDENTS OF SULTAN AGENG TIRTAYASA UNIVERSITY

Email Address [email protected]

1st Author Nurul Anriani

Affiliation Universitas Sultan Ageng Tirtayasa

Subsequent authors

Ahsanul Khair Asdar (Universitas Negeri Jakarta)

1. Aims/ Objectives of study:

To develop performance, comprehension, social skills, and discipline instruments in the implementation of KKN-PPM for students of Sultan Ageng Tirtayasa University

2. Sample: The sample in this research was 200 students of Sultan Ageng Tirtayasa University who follow KKN-PPM program selected by using simple random sampling method

3. Method: The model used in this research is a developmental research model with Confirmatory Factor Analysis (CFA) technique. The variables involved in this study consist of comprehension variables, social skills, discipline, and performance. So the result of this research is a standard instrument of measurement of understanding, social skill, discipline and student performance in the implementation of KKN-PPM. The data used in this study is primary data in the form of response given to the items on the instrument of comprehension, social skills, discipline, and performance by 200 students involved in KKN-PPM program. All the respondents involved were selected by using simple random sampling method. The research data processing that has been collected is done in two stages: (1) First Order Confirmatory Factor Analysis and (2) Second Order Confirmatory Factor Analysis. The whole analysis was done with the help of Lisrel 8.80 Full Version software.

4. Results: Based on the entire series of trials and revisions twice, the standard instruments were obtained to measure comprehension, social skills, discipline, and performance of the students in the implementation of KKN-PPM. The result of the instrument parameters of comprehension at the First Order Confirmatory Factor Analysis stage using maximum probability likelihood is obtained as follows: In Construction dimension, parameter value (λ) is 0,827. Conversely the lowest

Page 24: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

24

parameter (λ) value is 0,603. In Process dimension, parameter value (λ) highest is 0,738. Conversely the lowest parameter (λ) value is 0,580. In Conclusion Withdrawal dimension, parameter value (λ) highest is 0,775. Conversely the value of the lowest parameter (λ) is 0,550. The result of measurement model analysis shows that all items of statement have value t-value > 1,96 at level 0,05 which states that the items are valid and feasible to be used with construct validity value equal to 0,969. While the results of Second Order Confirmatory Factor Analysis of the comprehension instrument show that the factor load on the dimensions that make up the construct is valid, with the factor loading value > 0,5. The amount of charge factor for the construction dimension is 0,996; The process dimension is 0,986; and the making conclusion dimension is 0,997 with construct reliability value of 0,844. The result of the social skills instrument parameter measurement at the First Order Confirmatory Factor Analysis stage using maximum probability likelihood is obtained as follows: In Peer Relationship dimension, parameter value (λ) is 0,967. Conversely the lowest parameter (λ) value is 0,639. In Self Management dimension, parameter value (λ) highest is 0,955. Conversely the lowest parameter (λ) value is 0,826. In the Academic Success dimension, the highest parameter (λ) value is 0,951. Conversely the lowest parameter (λ) value is 0,739. In the Compliance dimension, the highest parameter value (λ) is 0,966. Conversely the lowest parameter (λ) value is 0,909. In Assertive dimension, parameter value (λ) highest is 0,979. Conversely the lowest parameter (λ) value is 0,799. The results of the measurement model analysis show that all items of statement have value t-value > 1,96 at level 0,05 which states that the items are valid and feasible to be used with construct reliability value equal to 0,996. While the results of Second Order Confirmatory Factor Analysis of social skills instruments show that the factor load on the constituent dimension is valid, with the factor loading value > 0,5. The amount of charge factor for the dimension of peer relationship is 0,950; The self-management dimension is 0,999; The dimension of academic success is 0,977; The dimension of compliance is 0,999; and assertive dimension is 0,999 with construct validity value of 0,962. The results of the discipline instrument parameters at the First Order Confirmatory Factor Analysis stage using maximum likelihood estimation are obtained as follows: In the Responsibility dimension, the highest parameter value (λ) is 0,982. Conversely the lowest parameter (λ) value is 0,894. In Self Development dimension, parameter value (λ) highest is 0,978. Conversely the lowest parameter (λ) value is 0,862. In Self-Control dimension, parameter value

(λ) highest is 0,986. Conversely the lowest parameter (λ) value is 0,864. The result of measurement model analysis shows that all items of statement have value t-value > 1,96 at level 0,05 which states that the items are valid and feasible to be used with construct validity value equal to 0,997. While the results of analysis of Second Order Confirmatory Factor Analysis of discipline instruments show that the factor load on the dimensions that make up the construct is valid, with the factor loading value > 0,5. The amount of charge factor for the dimension of responsibility is 0,989; The self-development dimension is 0,999; and the self-control dimension is 1,000 with construct validity value of 0,966. The results of the performance instrument parameters at the First Order Confirmatory Factor Analysis stage using maximum probability likelihood are obtained as follows: In Preparation dimension, the highest parameter value (λ) is 0,959. Conversely the lowest parameter (λ) value is 0,715. In Implementation dimension, parameter value (λ) highest is 0,928. Conversely the lowest parameter (λ) value is 0,603. In the Reporting dimension, parameter value (λ) is 0,987. Conversely the lowest parameter (λ) value is 0,818. The result of measurement model analysis shows that all items of statement have value t-value > 1,96 at level 0,05 which states that the items are valid and feasible to be used with construct validity value equal to 0,990. While the results of Second Order Confirmatory Factor Analysis of performance instrument show that the factor load on the dimensions that make up the construct is valid, with the factor loading value > 0,5. The amount of charge factor for the preparation dimension is 0,999; Implementation dimension is 0,978; and the reporting dimension is 0,992 with the construct validity value of 0,913.

5. Conclusions: Based on the results of the analysis that has been done then obtained the standards instruments of comprehension, social skills, discipline, and performance. The comprehension instrument is composed of the dimensions of construction, process, and conclusion. Social skills instruments consist of dimensions of peer relationship, self-management, academic success, compliance, and assertiveness. The discipline instrument consists of the dimensions of responsibility, self-development, and self-control. While the performance instrument consists of preparation, implementation, and reporting dimensions.

Page 25: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

25

Paper No KK_027

Paper Title Development of a Diagnostic English Grammar Test for Malaysian Lower Secondary School Students

Email Address [email protected]

1st Author Kho Chung Wei

Affiliation Faculty of Education, University of Malaya

Subsequent authors

1. Aims/ Objectives of study:

The study aimed to develop a diagnostic English grammar test for Malaysian lower secondary school students. Specifically, the study assessed the psychometric properties of the diagnostic test, determined if any of the test item is biased for some examinees, and examined if there is any significant association between the examinees’ scores on the diagnostic test and their school English exam scores.

2. Sample: The study was conducted on the whole Form 2 student population in a secondary school in Sarawak. Altogether, there were 202 Form 2 students spread across six classes. No sampling was done as it was more practical and useful for the school to administer the test to the whole population. On the day of the test administration, however, 34 students were absent from school; thus, the final sample for the study was 168. This represented a response rate of 83.16%. Using existing data, it was found that the sample was not significantly different from the population in terms of gender proportion and past English exam scores, suggesting that the non-responses did not introduce any bias in the study.

3. Method: The study was a small-scale pilot study utilizing the cross-sectional survey design. It began with observation and initial assessment of potentially problematic areas of language knowledge. This provided the basis for the preparation of a diagnostic English grammar test. The test consists of 55 multiple-choice items. After the test was prepared, it was administered to students from the research site. Existing data such as name, gender, past English exam scores and whether students had undergone an additional year of secondary schooling were obtained from the school database with permission from the school administrator. The data collected were then analysed using Winsteps version 3.66.0, jMetrik version 4.0.3 and SPSS Statistics version 21.

4. Results: Overall, the examinees’ responses to the diagnostic English grammar test fit the Rasch model with 85.45% of good fit items. There was also no strong evidence of a secondary dimension, suggesting that the assumption of psychometric unidimensionality of the test was not violated. The test scores were found to be significantly, largely, and positively associated with the school English exam scores. The test had high reliability index and was sensitive enough to classify examinees into 3 ability levels. The difficulty of the test was suitable for the current sample and the item difficulty can be stratified into five levels. Of all the 55 items, 81.82% were able to differentiate between examinees of different ability estimates; 92.73% did not exhibit any gender DIF; and 89.09% did not indicate any DIF in terms of additional year of secondary schooling. These findings imply that there was a substantial number of good items that can be retained. However, there were 5 items that appeared to be problematic in more than one aspects: Item 4 (underfit and low discrimination); Item 33 (overfit and too difficult); Item 42 (underfit and low discrimination); Item 21 (too easy, low discrimination and exhibit gender DIF); and Item 32 (underfit, low discrimination, exhibit DIF in terms of additional schooling year, and farthest from Rasch convergence). As such, these items would be eliminated from future version of the test. This leaves approximately 90.91% or 50 items in the item bank. Of these 50 items, 1 item would be revised for being an underfit; 6 would be revised to enhance their discrimination; 3 would be reviewed for gender biasness; and 5 would be reviewed for biasness in terms of additional secondary schooling year. Additional items would also need to be drafted to fill the gaps in terms of item difficulty.

5. Conclusions: In conclusion, the diagnostic English grammar test had good psychometric properties but there were some items that needed to be reviewed and revised or eliminated. This implies that it is possible for problematic items to be present in a test with an overall sound psychometric properties. The detection of such items justifies the need for extensive pilot testing in the test development process. This implies that the development of a diagnostic language test can be a cyclical and never-ending process. The findings of the study are significant to test developers and researchers who are interested in developing a diagnostic English language test for Malaysian lower secondary school students as well as the examination board.

Page 26: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

26

Paper No KK_029

Paper Title Using Rasch Model for the Development of Intention to Stay Scale (ITSS) among Medical Academics at Public Universities

Email Address [email protected]

1st Author Wan Ismahanini Ismail

Affiliation Faculty of Educational Studies, Universiti Putra Malaysia, Serdang, Malaysia

Subsequent authors

1 Roziah Mohd. Rasdi, 3 Rahinah Ibrahim, 4 Bahaman Abu Samah 1, 3, 4 Faculty of Educational Studies, 3 Faculty of Design and Architecture, Universiti Putra Malaysia, Serdang, Malaysia

1. Aims/ Objectives of study:

In this study, we discuss the initial stage of new scale creation that is Intention to Stay Scale (ITSS) using (Liu, 2010) framework and uses Rasch Model in scale evaluation to ensure the psychometrically sound measure is created and collect evidence of validity.

2. Sample: The survey was administered on a convenience sample of 52 medical academics from two public universities.

3. Method: Few analyses utilised in the early stage of data collection that involved 50-item scale like rating scale calibration, rating scale analysis, item fit, person misfit order, variable map, separation and reliability in persons and items, principal component analysis (unidimensionality) and Differential Item Functioning (DIF).

4. Results: In surveying 52 medical academics through the pilot test one from two public universities, a unidimensional construct of ITSS was empirically established after misfitting items and persons were removed, and scale modifications were made.

5. Conclusions: It was found that ITSS can be used in the next tests to measure medical academics intention to stay in service. The scale also shows that it has construct validity as displayed on the variable map, where the item featured level of difficulty is not too different from the conceptual framework of the proposed research.

Paper No KK_030

Paper Title Development of a Model of Positive L2 Self using the Rasch Model

Email Address [email protected]

1st Author J. Lake

Affiliation Fukuoka Jo Gakuin University

Subsequent authors

Keita Kikuchi Affiliation: Kanagawa University

1. Aims/ Objectives of study:

The field of positive psychology has been rapidly growing in the past few years. Interest in applying positive psychology to education is a more recent development (e.g., Furlong, Gilman, & Huebner, 2014; White & Murray, 2015). A few researchers have applied it to the field of second language (L2) learning in a variety of contexts and a range of identity or self-levels from general trait-like to the specific state-like (e.g., Gabryś-Barker & Gałajda, 2016; Lake, 2013; MacIntyre, Gregersen, & Mercer, 2016). The presenters show the process of developing a model of positive L2 self that integrates constructs of positive psychology and motivation in the context of L2 learning.

2. Sample: This study was based on questionnaires given to over 3,500 Japanese college students and case study interviews conducted with a limited few.

3. Method: We have used Winsteps for the Rasch analysis of global positive self-constructs of flourishing, curiosity, and hope; positive L2 self-constructs of interest, passion, and mastery goal orientations; and L2 self-efficacy in speaking, listening, and reading. These measures are then used with AMOS to construct a structural model of a Positive L2 Self.

4. Results: The measures fit the Rasch model and an acceptable fit was found for a structural model of a positive L2 self that incorporated constructs from positive psychology and second language motivation.

5. Conclusions: Using both quantitative and qualitative data, presenters discuss how these constructs can be applied by researchers and educators in developing positive identities of language learners.

Page 27: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

27

Paper No KK_031

Paper Title Developing a Vocabulary Specification Equation for Second Language Learners

Email Address [email protected]

1st Author J. Lake

Affiliation Fukuoka Jo Gakuin University

Subsequent authors

1. Aims/ Objectives of study:

This presentation describes the development of a vocabulary specification equation for second language (L2) learners for diagnostic purposes. I will first briefly review and summarize the literature on corpus studies, wordlists, vocabulary tests, and test specifications. Next I will explain how a vocabulary test was developed based on the research reviewed and following the procedures described in the test and item specifications for Japanese university students learning English as a second language.

2. Sample: Over 350 Japanese university students were sampled.

3. Method: Winsteps was used for a Rasch analysis of the test.

4. Results: The results showed that a test with items based on frequency could be produced with the specifications as a guideline and that the resulting item difficulties had a strong relationship to the frequencies that was captured by the vocabulary specification equation.

5. Conclusions: Implications of these results suggest that the specification equation can be used systematically by teachers and learners to guide vocabulary study in the development of student L2 proficiency.

Paper No KK_032

Paper Title Measuring the Validity and Reliability of Arabic Vocabulary Knowledge Test Using Rasch Model Approach

Email Address [email protected]

1st Author Zunita Mohamad Maskor

Affiliation Faculty of Education, Universiti Kebangsaan Malaysia, Malaysia.

Subsequent authors

Harun Baharudin & Maimun Aqsha Lubis, Faculty of Education, Universiti Kebangsaan Malaysia, Malaysia.

1. Aims/ Objectives of study:

To investigate the validity and reliability of the Arabic Receptive Vocabulary Test (ARVT) using the Rasch Measurement Model

2. Sample: Data were collected from a vocabulary test named Arabic Receptive Vocabulary Test (ARVT) which had been answered by 106 Form Four students at one of the Islamic Religious Secondary School located in Perak.

3. Method: Arabic Receptive Vocabulary Test (ARVT) is a form of testing that was developed to measure receptive vocabulary knowledge in Arabic language. The purpose of the testing was to find out the number of words known to the students. The test development used simple random sampling from A Frequency Dictionary of Arabic (Buckwalter & Parkinson, 2011) based on a ratio of 1: 100 derived from 2000 words frequency and consists 25 dichotomous items of Yes/No answering pattern. It was answered by 106 respondents whom selected among Form Four students, which administered within 15 minutes. Rasch analysis was done using WINSTEP software version 3.72.3 due to investigate whether the test was unidimensional and fit.

4. Results: The test was unidimensional and fit the Rasch model’s expectation.

5. Conclusions: The finding demonstrated that item reliability and item separation was fitted to the model’s expected. The Arabic Receptive Vocabulary Test (ARVT) is competitive to be used to measure Arabic vocabulary acquisition among secondary students.

Page 28: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

28

Paper No KK_033

Paper Title A Rasch Analysis of the Reading, Grammar, and Essay Sections of a Japanese University Entrance Examination

Email Address [email protected]

1st Author Kristy King Takagi

Affiliation Chuo University, Tokyo Japan

Subsequent authors

1. Aims/ Objectives of study:

University entrance examinations in Japan have been widely criticized, especially since the 1995 landmark studies of Brown and Yamashita. However, there are few universities that have responded to such criticisms to the extent that the exams have been significantly altered. One format that remains standard for many English entrance exams at Japanese universities is: a reading passage with comprehension questions; grammar exercises, such as fill in the blank and arrange English sentence parts in correct order, based on a Japanese translation; and an essay prompt that is related to the reading passage. The essay target length is usually short (sometimes less than 100 words), so that the final piece of writing often resembles a paragraph more than an essay. The purpose of this study is to examine whether each section of the entrance examination contributed to assessment of student applicants, and to determine the relationship among the examination sections. Key Words: university entrance examinations in Japan, Japanese higher education

2. Sample: The data used for this study come from an actual entrance examination which was administered to 54 students of high school age, by a large private university in eastern Japan in the fall of 2016. Although student names and background information were not available, the students could generally be described as high school seniors from a variety of prefectures throughout Japan.

3. Method: The fit, difficulty, and reliability of the entrance examination components will be assessed using the Rasch model. Specifically, FACETS software will be used to examine the essay ratings of two examiners, together with dichotomous scores on the reading passage comprehension questions and two grammar sections of the test.

4. Results: Although the analysis of the entrance examination data is currently in progress, preliminary results show that, although dichotomous test items demonstrated reliability of .90, and essay ratings, .94, there were a number of problems, particularly with the dichotomous test items. First, the difficulty level of the test sections was inconsistent. The reading passage comprehension questions were considerably easier than the grammar questions. In addition, as a whole, the test was generally too easy for applicants. In other words, the distribution of test items was not a good match to the distribution of applicant ability. A number of the highest ability applicants had no test items at their level of ability, and nearly 10 of the 30 dichotomous items were too easy for all applicants. On the other hand, the essay ratings, which were the lowest of all test scores, demonstrated good fit. The essay ratings had little in common with the reading passage comprehension questions, but demonstrated medium to large correlations with the grammar test item totals.

5. Conclusions: Data from actual university entrance examinations in Japan can be nearly impossible to obtain, primarily because of the purported need to protect the privacy of students. The upshot of such policies is that these tests cannot be evaluated in an objective manner and then revised accordingly. It is hoped that the results of this small study can be useful in terms of providing ideas for test revision for the university that generously gave consent for use of these data, and that at least a small ripple effect in higher education in Japan might result as well.

Page 29: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

29

Paper No KK_034

Paper Title Applying Rasch Model To Identify A Contribution of Marital Status in Perceived Social Support of Merapi Volcanic Eruption Mount Survivors

Email Address [email protected]

1st Author Chandra C. A. Putri

Affiliation (Indonesia University of Education

Subsequent authors

Ifa Hanifah Misbach (Indonesia University of Education)

1. Aims/ Objectives of study:

The aim of this study is to identify a difference perceived social support sources based on marital status in survivors of mount Merapi volcanic eruption.

2. Sample: Samples were selected using conveience sampling technique which consisted of 82 survivors of mt. Merapi volcanic eruption (20-40 years old) who lived in Cangkringan, the sub-districts with the highest number of survivors.

3. Method: This research used Rasch Modelling through differential item functioning (DIF) to identify the difference of responses pattern of perceived social support sources based on marital status.

4. Results: There is a difference of responses pattern between 46.34% respondents who have married and 53.66% who have not. The curve of DIF analysis showed that respondents who have married is easier to answer the items rather than who have no married, especially on items that linked to the source of support from significant others.

5. Conclusions: The marital status can differentiate a perceived social support responses of survivors of mount Merapi volcanic eruption.

Paper No KK_035

Paper Title Assesing Pedagogical Content Knowledge of The Particle Theory of matter and Phasa Change in Pre-service Science Teacher

Email Address [email protected]

1st Author Maryati

Affiliation Yogyakarta State University

Subsequent authors

Zuhdan Kun Prasetyo, Yogyakarta State University Insih Wilujeng, Yogyakarta State University Bambang Sumintono, Malaya University

1. Aims/ Objectives of study:

This research aims to asses the quality of PCK in pre-service secondary science teachers in a specified topic— The particle theory of matter and phasa change.

2. Sample: Sample in this research consist of 16 pre-service secondary science teachers as members of professional teacher training programe, with 32 lesson plans and instructional sessions videotaped

3. Method: This is a quantitative research method to measure teacher’s PCK with PCK rubric that developed base on Magnuson et al.’s PCK component model. Measuring involved multiraters and analyzed by a many-facet Rasch measurement.

4. Results: Results indicate that PCK from Indonesian pre-service secondary science teachers is still low, especially on knowledge of science curricula, Knowledge of students’ understanding of science and Knowledge of instructional strategies.

5. Conclusions: The ability of science teacher’s PCK in Indonesia as a criterion of professional teachers still need to be improved and science teacher education curriculum must be reformed.

Page 30: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

30

Paper No KK_036

Paper Title Preliminary Report on the Development and Calibration of a Rasch Scale to Measure Chinese Reading Comprehension Ability in Singaporean 2nd Language Primary School Students, Part II

Email Address [email protected]

1st Author Chung Tze Min

Affiliation Commontown Pte Ltd

Subsequent authors

Mohd Nor, M. Z., Newstar Agencies; Yan, R. J. J., Commontown Pte Ltd; Loo, J. P. L., Commontown Pte Ltd

1. Aims/ Objectives of study:

To help teachers place students in their appropriate Chinese reading comprehension levels, a measurement scale of reading comprehension was created and calibrated in 2016. As there were misfitting items found in the 2016 Rasch analysis, till date, about half of the items were refined and trialed. The current study evaluated and validated the measurement functionings of the reading comprehension scale.

2. Sample: A total of 19418 students to date from 42 schools participated in the adaptive placement test. Their school grades ranged from Primary 1 to 6; ages ranged from 7 to 12 years old.

3. Method: Rasch analyses were conducted to validate the reading comprehension scale. In addition, the average item measure for each passage was calculated and correlated with the passage's Lexile measure as well as with teachers' levelling of the passage. Furthermore, average item measure for each question type was calculated to find out if items that required more cognitive effort to answer were indeed more difficult. We also compared the number of misfitting items for outfit ZSTD with last year’s number.

4. Results: Results indicated the Rasch indices were within acceptable ranges for a low-stake standardized test (Person indices: Infit MNSQ = .99, Outfit MNSQ = .93, Separation = 2.58, Reliability = .87; Item indices: Infit MNSQ = 1.03, Outfit MNSQ = .99, Separation = 6.66, Reliability = .98). The items were, however, more

difficult for the students tested(Item measure = 0. Person measure = -.86). Average item measures and teachers’ levelling of the passages were moderately

correlated with the passages' Lexile measures (r = .61 and r = .70 respectively) while average item measures and teacher’s leveling were strongly correlated (r = .90). Dimensionality analysis results show that 30.3% of the variances were explained by the latent trait, which is close to the general guideline of 29.5% for computer adaptive tests (Linacre, 2014). It shows the data are accountable by only one dimension, which is the latent trait of item difficulty. The Eigenvalue for the unexplained variance in the first contrast is 3, which is higher than the recommended value of 2, but the variance explained by the first contrast is only .2%. Items that require more cognitive effort to answer were indeed more difficult (e.g., average item measures for Pinyin is -2.27, vocabulary, -.82, low cognitive level processing -.39, sentence structure, .88 and high cognitive level processing, 1.76) and the difficulty levelling of the various question types agreed with the findings reported by Meneghetti, Carretti, and De Beni (2006). Number of erratic items has also reduced from 12.5% (184 out of 1,462 items) of total items analyzed in 2016 to 7.6% (91 out of 1194 items) in 2017.

5. Conclusions: The reading comprehension scale is valid in measuring children’s reading comprehension abilities. As item refinement over the past year has improved the quality of the scale, we will continue to refine the remaining items and trial them in 2018.

Page 31: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

31

Paper No KK_037

Paper Title Multidimensional Rasch Analysis of Teaching Role-Specific Esteem

Email Address [email protected]

1st Author Yu-Shu Chen

Affiliation National Chung Cheng University, Taiwan

Subsequent authors

Yuan-Chi Lai, WuFeng University, Taiwan

1. Aims/ Objectives of study:

Self-esteem is a central construct in psychological theory. However it is coupled with disagreement over how the construct is conceived. The lack of consensus has been lamented by researchers over years (Tafarodi & Swann, 2001). Through literature review, we adopt the position that teaching role-specific esteem consists of two distinct dimensions, which are role-competence and role-liking. That is, individuals take on value both by merit of what they can do and what they appear to be teachers. The former is founded on teaching abilities and talents, the latter on attractiveness and other aspects of teaching role worth. The main objective of the present study was to examine the validation of teaching role-specific esteem scale using multidimensional rating scale analysis. Key words: multidimensional Rasch model, rating scale model, role-competence, role-liking, teaching role-specific esteem.

2. Sample: Teachers from elementary, junior, and senior high schools in Taiwan comprised the sample for the current study. A total of 747 teachers were administered the 16-item teaching role-specific esteem scale consisting of 2 dimensions. Participants were invited to attend a survey and complete the teaching role-specific esteem scale.

3. Method: Because both role-competence and role-liking was designed to measure the teaching role-specific esteem, the multidimensional form of the rating scale model, the items within a scale were judged on the same kind of rating scales (Andrich, 1978), was used to analyze the data. To ascertain whether the scale items fit the model, two kinds of analysis were conducted. One was item fit analysis, and the other was the analysis of differential item functioning (DIF; Holland & Wainer, 1993).

4. Results: In this study, we were interested in two kinds of DIF: gender (two groups) and employment styles (two groups). Separate group analyses showed that two items have MNSQ out the critical range (0.70, 1.30). After deleting DIF items, the remaining items of each dimension in the teaching role-specific esteem scale constituted a single construct. The correlations (.55) between dimensions obtained from the multidimensional approach were much higher than those obtained from the unidimensional approach.

5. Conclusions: The dimensionality of the teaching role-specific esteem scale is between-item multidimensionality. The results have demonstrated that the multidimensional rating scale model can be used to validate the scale, which is useful for researchers and practitioners interested in investigating teaching role-specific esteem. Limitations and directions for future research were discussed.

Page 32: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

32

Paper No KK_038

Paper Title Improving Teaching and Student Learning through Evaluation of one TOEIC Preparation Textbook

Email Address [email protected]

1st Author YihYeh Pan

Affiliation Sanno University

Subsequent authors

Kristy Takagi, Chuo University

1. Aims/ Objectives of study:

There are hundreds of TOEIC Preparation textbooks used at universities, private English schools, and cram schools in Japan. But there are few studies of the quality of items used in these books. The aim of this project is to evaluate the test items in one TOEIC preparation textbook which is commonly used in university English courses in Japan. In order to determine whether the book chapters move from easiest to most difficult items, or remain stable in difficulty, as would be expected in a well-written test preparation book, the evaluation of test items will cover two areas: 1) All 100 items, from the five chapters, will be assessed for fit, difficulty and reliability, and 2) The difficulty level of items used in each chapter will be investigated.

2. Sample: The participants are all Japanese students currently studying in a university in eastern Japan. The ages of these participants are from 19 to 21 years old. Most have studied English for six to seven years. According to their placement test scores derived from the entrance exam, these students were placed in the most advanced English language class in the university.

3. Method: In the first evaluation of test items, the fit, difficulty and reliability of items will be assessed using the Rasch Model. These results will also provide insight into the second evaluation, of the difficulty level of items used in each chapter. In addition, the progression of difficulty of the five chapters in the TOEIC preparation book will be considered in light of the placement of students in relation to test items on the variable maps of the five chapter tests.

4. Results: Currently we are still in the process of collecting data. However, based on past experience with student performance on and reaction to test items, we predict that the test items of these TOEIC preparation book chapters will not move from the easiest to the most difficult, or even remain stable in difficulty level, which

one expects from a well-written test preparation book. In other words, we predict that the TOEIC preparation textbook will be flawed in ways related to difficulty level, and that revision is needed in order to improve both the teaching and learning experience.

5. Conclusions: This kind of analysis is needed for many reasons. Teachers need to understand the level of difficulty of test preparation book chapters more deeply in order to plan lessons and teach more effectively. As for publishers, they should pursue this kind of analysis so that they can revise and improve the texts they produce. These publishers and TOEIC test preparation textbook writers should both be more accountable for the progression of difficulty in TOEIC preparation textbooks because students’ skill development and confidence are both at stake.

Page 33: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

33

Paper No KK_039

Paper Title DEVELOPMENT OF INSTRUMENTS FOR MEASURING MATHEMATICAL LOGICAL THINKING ABILITY COLLEGE STUDENTS IN KAPITA SELEKTA

Email Address [email protected]

1st Author Novaliyosi

Affiliation Universitas Sultan Ageng Tirtayasa, Banten, Indonesia

Subsequent authors

1. Aims/ Objectives of study:

This study aims to develop instruments to measure mathematical logical thinking ability college students in Kapita Selekta

2. Sample:

3. Method: The method used through the development stage used in this study include: (1) defining variables; (2) describe the variables into more detailed indicators /dimensions; (3) arrange the items; (4) conducting trials; (5) analyzing the validity and reliability

4. Results: The results of the trial of legibility is instruments designed easy to read and well understood by students and the results of validity and reliability test show that the instrument of matematical logical thinking ability developed included into the category of valid and fit to be use as an instrument

5. Conclusions: The instruments can be use to measure the mathematical logical thinking ability

Paper No KK_040

Paper Title Assessing competency level among SIPartners+ using Rasch Model approach

Email Address [email protected]

1st Author Hishamuddin Hashim

Affiliation

Subsequent authors

Roland@Rozaidi Abu Hajjan, Nordin Tahir, Nurul Badar Mohd Salleh, Mohd Jalani Hasan, Raja Hamizah Raja Harun, Nurulhidayah Sukiman, Siti Sarah Baharom, Ismail Mohamad, Mohd Kashfi Mohd Jailani

1. Aims/ Objectives of study:

With aim to improve the management quality of school leaders in Malaysia, Ministry of Education (MOE) has come up with the initiative of appointing selected officers as School Improvement Partners (SIPartners+). They are responsible in improving the quality of leadership through coaching and mentoring, as well as implementing interventions. They also serve as subject matter expert in school leadership development. The aim of this research was to identify the competency profile of SIPartners+ who were attached to District Education Office/State Education Office throughout Malaysia.

2. Sample: The instrument was administered on 220 SIPartners+ in District Education Office/State Education Office throughout Malaysia. The respondents consisted of 160 males (73%) and 60 females (27%) who were selected using stratified random sampling method. 148 of the respondents (67%) possessed post graduate degrees while 72 ( 33%) were a degree holder.

3. Method: The research used a self-administered questionnaire on Facilitator Competency Profile Instrument (FCPI 2). It contained 99 items with 5 competency elements: General, Coaching & Mentoring, Subject Matter Expert, Clients Advancement Plan, and Needs Analysis/Evaluation. However, only items related to skill competency level were involved for the purpose of this research. The data collected were analysed using Rasch Model Analysis. Summary statistics was used to find the reliability and separation of person and

Page 34: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

34

items. The overall performance in the assessment was analysed using Person-Item Distribution.

4. Results: The summary statistics output reveal that the value of Cronbach alpha is 0.98.The person reliability is 0.96. The separation of person (G) is 4.98 The summary statistics for items show reliability at 0.97. It also shows that the items have good difficulty measurement in measuring SIPartners+ ability. The separation of items is 6.04. The Wright map for the distribution of person and item for the (FCPI 2) shows the distribution of all persons and items on the logit measurement ruler. Majority of SIPartners+ has mastered Effective Communication while the least mastered skill was Academic Writing.

5. Conclusions: The aim of this study was to identify the competency profile of SIPartners+. Based on the data derived from the questionnaire, it can be concluded that most of SIPartners+ excel in the competency profile. This indicates that they mastered all the skills needed as SIPartners+. However, they are still lacking in Needs Analysis skill. Therefore, immediate action needs to be taken by relevant authorities to ensure that all SIPartners+ will acquire this skill in the future.

Paper No KK_041

Paper Title Validating Value Domain of the Facilitator Competency Profile Instrument SIPartners+-2 (FCPI- SIPartners+-2) Using Rasch Model Analysis

Email Address [email protected]

1st Author Raja Hamizah Raja Harun

Affiliation

Subsequent authors

Nurulhidayah Sukiman, Siti Sarah Baharom, Ismail Mohamad, Mohd Kashfi Mohd Jailani

1. Aims/ Objectives of study:

The competency profiles expands on three main domains, namely knowledge, skills and values that serves as the basic requirements that need to be possessed by School Improvement Partners+ (SIPartners+) in enhancing their competency and potential. This research aims to validate the value domain of the FCPI- SIPartners+-2 for SIPartners+ officers’ competency profile. Therefore, based from this research findings, the domain value of competency profiles for SIPartners+ group will be enhance accordingly. The purpose of this paper was to validate and examine the reliability and validity of FCPI- SIPartners+-2 .

2. Sample: The instrument was administered on 220 SIPartners officers throughout Malaysia. Their age group was between 40 to more than 50 years old. The samples were selected using stratified random sampling.

3. Method: This study focused on value domain of the FCPI- SIPartners+-2 (Likert scale 1 -5) which consists of 29 items. To gauge its validity and reliability of item and respondents, Winstep Version 3.68.2 was used in the process. The Rasch model was used because it can measure person reliability and item reliability and is more robust compared to Cronbach’s Aplha. It also allows item elimination based on t-value and differential measure. Analysis was based on Item and respondent validity, item and respondent reliability, Identifying Item fit, Item Difficulty and Respondent Ability

4. Results: All items have values of PTMEA CORR from 0.53 to 0.81. Cronbach-alpha value (KR 20) of FCSI-2 for value domain indicates high reliability of the questionnaire at 0.98, item reliability index value of 0.95, person reliability value of 0.92. Value of infit/outfit item shows there are three items which have more than 1.4 that

Page 35: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

35

Item 70 PR =1.42 logit, Item 21 U =1.88 logit and Item 20 U = 1.48 logit. Based from item map results. Three items that have been identified as difficult.

5. Conclusions: Through Rasch model analysis, researchers have obtained high validity and reliability value from the test conducted. This means that the questionnaire is valid and reliable to validate the competency profiles. The item reliability is high and this means the item is stable. All PTMEA shows positive value that shows all the items used are parallel to the measurement in terms of validity. In examining the fit statistics, the outfit and infit MNSQ statistics used the range of 0.60 to 1.40 as the basis. Three items need to be improved. In terms of item difficulty and respondent ability, also three items need to be taken into account for improvement.

Paper No KK_042

Paper Title Measuring Scientific Literacy: Using the Rasch Model Analysis to Determine Student Competency Using Data from PISA 2015

Email Address

[email protected]

1st Author Nor Azizi bt Abdullah

Subsequent authors

Dr. Wan Raisuha bt Wan Ali (PhD)

1. Aims/ Objectives of study:

To identify factors that influence students' competency in providing responses for the items of Scientific Literacy in PISA 2015. The objective of the study is to answer the following questions: 1. Are the items a good measure for Malaysian students’ proficiency level in Scentific Literacy?; 2. Which items are considered difficult or easy by the students?; and 3. Which items show erratic responses from the students? Hypothesis: The higher the cognitive demand of the items the less likelihood of students with lower ability to respond correctly to the items.

2. Sample: 42 students who were randomly selected from each (randomly selected) school based on specific strata such as school category, school type, school location and medium of instruction. using the Keyquest software provided by OECD. The total number of students sampled for for the PISA 2015 study was 9,622.

3. Method: There were 184 science related items used in PISA 2015 which students were given various combinations (forms). Items were coded according to the numeric order of the item followed by four letters: the first letter to indicate the competency assessed; the second letter, to indicate the type of knowledge involved; the third, the system of knowledge involved and the fourth letter to indicate the level of cognitive demand assigned by the test designers. The competencies assessed for the items in the Scientific Literacy of PISA 2015 were symbolized by; P for explaining phenomena scientifically; E for evaluating and designing scientific inquiry; and D for interpreting data and evidence scientifically. The type of knowledge involved are symbolized by: C for content knowledge; P for procedural knowledge; and E for epistemic knowledge. The system of knowldege used in the items are symbolized by: P for physical systems; L for living systems; and E for earth and space systems. The level of cognitive demand assigned by the test developers were symbolized by: L for low cognitive demand; M for medium cognitive demand; and H for high cognitive demand. The PISA data provided several codes for student responses which include single digit codes of “1”, “0”, “9”, “6”, or “7” and double digit codes of “21”, “11”, “12”, “01”,

Page 36: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

36

“02”, “03”, or “04”. Codes “1”, “11”, “12” or “21” were used for credited responses with the “11” and “12” signifying partial credit while “21” as full credit. Codes “0”, “01”, “02”, “03” or “04” were given for non credit responses of students. Codes “9” or “99” were non responses by students. Codes “6” or “96” were student responses beyond the time given, while codes “7” or “97” were for items that were hidden from students. For the purpose of this study, credited responses both for single and double digits were re-coded as “1” while non-credit responses were re-coded as “0”. Non-response codes and the other four codes were assumed as missing responses. After re-coding the responses, analysis was carried out using the Bond and Fox software. The outcome of the analysis is used as data for this study.

4. Results: The analysis showed that the items provided a good measure of student ability based on the person reliability. However, a difference of almost 1 logit indicated that many of the items were difficult for many students. The analysis also found that some items that were assigned as low cognitive demand were of high difficulty for the students, while some items that were assigned as high cognitive demand were fair for average students. The item of codes provided a reference to look further into the items with respect to the type of responses required or the task demanded of the students in giving responses appropriately.

5. Conclusions:

The findings show that while Scientific Literacy items provided a good measure of the ability of 15+ year old students in Malaysia, the items were fairly difficult for many students. Some items that were labelled as easy were difficult even for high ability students. On the other hand, some items that were labelled as difficult could be answered even by average students. Further analysis into various aspects of the items would be required to provide a better description of students’ strengths and weaknesses in solving tasks given in the Scientific Literacy items of PISA 2015. The analysis of the items would entail identifying the characteristics of the items to provide background and a wider scope of insight for the policy makers at the Ministry of Education and teachers to strategize efforts towards improving the scientific literacy among students in Malaysia. With the dissemination of information through this analysis, teachers are more able to align the teaching and learning experience based on the strengths and weaknesses of the taught curriculum. Last but not least, students are better able to engage with the learned curriculum in addressing issues arising from real life situations, a characteristic of the PISA 2015 Scientific Literacy items.

Paper No KK_043

Paper Title Validating Knowledge Domain of Facilitator Competency Profile Instrument – SISC+1 (FCPI-SISC+1) Using Rasch Model

Email Address [email protected]

1st Author Zulkifili Salleh

Subsequent authors

Noraisah Jamil, Shariff Yob, Shahrizal Shaarani, Mohd Isham Embong, Raja Hamizah Raja Harun, Nurulhidayah Sukiman, Siti Sarah Baharom, Ismail Mohamad, Mohd Kashfi Mohd Jailani

1. Aims/ Objectives of study:

This research is based on Facilitator Competency Profile Instrument – SISC+1 (FCPI-SISC+1) on Knowledge Domain which consists of 34 dichotomous items focusing on 5 standards namely general knowledge, coaching and mentoring, subject matter expert, coachees’ advancement and needs analysis study/ assessment. School Improvement Specialist Coaches+ (SISC+) refers to Education Officer who is a subject matter expert in teaching and learning and able to provide support to improve teachers’ quality through strategic coaching and mentoring. The purpose of this paper was to validate the reliability and validity of the Facilitator Competency Profile Instrument – SISC+1 (FCPI -SISC+1) on Knowledge Domain using Rasch Model.

2. Sample: The instrument was administered to 587 SISC+ throughout Malaysia using stratified random sampling. The sample of male respondents were 47.9% (n = 281) and female respondents were 52.1% (n=306). The sample comprises of 2.2% (n=13) PhD holders, 33.4% (n=196) master holders and 64.4% (n=378) degree holders.

3. Method: This research used Facilitator Competency Profile Instrument – SISC+1 (FCPI -SISC+1) focusing on Knowledge Domain. The Rasch model was used because it can measure person reliability and item reliability and is more robust compared to Cronbach’s Alpha. It also allows item elimination based on t-value and differential measure. Winsteps version 3.68.2 was used in the process.

4. Results: The person reliability value obtained was 0.64 while the item reliability value was 0.95 for dichotomous items. The Winsteps analysis revealed a positive value in PTMEA CORR, a high item reliability and person reliability index and the ordering of items in terms of hierarchy according to level of difficulties.

5. Conclusions: Results of the data analysis using Winsteps recorded a high level of person and item reliability index. Therefore, the instrument FCPI -SISC+1 on knowledge domain has high validity and reliability. Hence, the used of Rasch Model in analyzing the instrument has significantly contributed to its validity.

Page 37: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

37

Paper No KK_044

Paper Title Measuring The Status Of Fasilinus Current Professional Profile Using Rasch Model

Email Address [email protected]

1st Author Ruzita Ahmad

Affiliation

Subsequent authors

Muhammad Faizal Razali, Noor Azam Asmaran, Wan Nor Anita Wan Hassan, Raja Hamizah Raja Harun, Nurulhidayah Sukiman, Siti Sarah Baharom, Ismail Mohamad, Mohd Kashfi Mohd Jailani

1. Aims/ Objectives of study:

The role of FasiLINUS is to realise the Ministry Of Education (MOE) goal through LINUS2.0 Programme in ensuring all level 1 pupils (year 1,year 2 and year 3) except for pupils with Special Educational Needs (SEN) to achieve basic literacy for Bahasa Malaysia, English and Numeracy through screening conducted in primary schools. This research is aimed to identify the current job profile of FasiLINUS in District Education Office / State Education Department in Malaysia.

2. Sample: The study involved 377 FasiLINUS as sample. The respondents consisted of 173 males (46 %) and 204 females (54%) who were selected using stratified random sampling method. 65 of the respondents (17%) possessed post graduate degrees while 312 ( 83 %) were a degree holder.

3. Method: The Competency Profile of Validation Study for Education Officers of Group Facilitator instrument was used in this study. It contained 70 items with four FasiLINUS competency standards namely coaching and mentoring, clients’ advancement, subject matter expertise and needs analysis.The data collected were analysed using Rasch Model. Analysis on item map was carried out to identify the level of difficulties of item- person.

4. Results: From the analysis, a list of tasks were agreed by fasiLINUS as their current professional profile.There were 2 items less agreed by respondents namely B10 Quality Procedure MS ISO 9001:2008 .

5. Conclusions: From the results it requires the implementation of intervention programme PK18 procedure for Quality control compliance; and B9 Finance Circular which involved conducting briefing/courses and it requires OS 21000 and OS 29000 compliance. However these items need to be reconsidered as they are the added values to previous knowledge and professionalism development for FasiLINUS.

Paper No KK_045

Paper Title USING RASCH MODEL TO ASSESS THE FOREIGN LANGUAGE SPEAKING ANXIETY SCALE (FLSAS) AMONG UNIVERSITY STUDENTS IN SALATIGA

Email Address [email protected]

1st Author Rizki Parahita Anandi

Affiliation Faculty of Education, University of Malaya

Subsequent authors

Resa Syafitri (Faculty of Education, University of Malaya), Bambang Sumintono (Institute of Educational Leadership, University of Malaya)

1. Aims/ Objectives of study:

The aim of this study is to examine the validity and reliability of the Foreign Language Speaking Anxiety Scale (FLSAS) by using the Rasch Measurement Model approach

2. Sample: Forty-six Arabic Language Education students from a university in Salatiga participated in this research

3. Method: The survey research design is used in this study. The data is collected by distributing the Foreign Language Speaking Anxiety Scale (FLSAS) to the students of Arabic Language Education. The data is then analyzed using Rasch Model to measure the validity by referring to its value of row variance and the unexplained variance. While its reliability is measured by analyzing its person-item reliability and Cronbach’s Alpha value. The wright person-item map is also used to analyze the items of questionnaire.

4. Results: The result showed that the value of row variance of the questionnaire is 40.9% and the unexplained variance of it does not exceed 15%. This result indicated that the questionnaire is valid in terms of its construct validity. The reliability of the questionnaire is measured by referring to the person reliability (.88), item reliability (.90) and the value of Cronbach’ s Alpha (.90) of the questionnaire as well. The wright item-person map showed 3 items which are placed on the top of the map that represented the situation where students did not feel anxious while speaking Arabic language. On the other hand, the map also showed an item that is placed on the bottom of the map which means that most of students get anxious when they cannot express their thought while speaking Arabic language.

5. Conclusions: It can be concluded that this instrument is valid and reliable to measure the students’ level of anxiety in speaking Arabic language

Page 38: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

38

Paper No KK_047

Paper Title Global mindset: Assessing construct dimensionality

Email Address [email protected]

1st Author Jeffrey Durand

Affiliation Toyo Gakuen University, Japan

Subsequent authors

Jason Pratt, Toyo Gakuen University, Japan Sarah Louisa Birchley, Toyo Gakuen University, Japan

1. Aims/ Objectives of study:

The number of Japanese university students studying abroad has been decreasing, and at the same time, more and more are interested in working for a traditional Japanese company. On the other hand, the number of Japanese tourists abroad is increasing, as is overseas production by Japanese companies. We are interested in preparing our students for work in a global economy and having a connection to a global community. To this end we have developed a questionnaire to understand what kind of global mindset our students have. Global mindset, the interest to be involved in an international community, is developed from previous work on motivation and the L2 self system (Kikuchi, 2016) and international posture (Yashima 1998). Nine constructs resulted. The goal of this research is to compare the dimensionality of Global mindset with that of the motivation research. A further goal is to determine whether some constructs are theoretically and statistically similar enough that they might be considered as one.

2. Sample: Initial sampling consists of 40 students at a Japanese university. Further sampling is being conducted to increase this number. Students are recruited from a number of required language and other classes across all majors at the university. Student language abilities range from very low to very high.

3. Method: Rasch partial credit analyses were conducted separately on the nine constructs of global mindset. Item and person fit was examined and principle components analyses were conducted.

4. Results: Early results suggest that most items in each construct fit well. Construct dimensionality, however, is sometimes suspect. These results will be updated when more data is included. Dimensionality of global mindset constructs may not coincide with that of the similar language motivation constructs from which they were derived.

5. Conclusions: If these results stand, ‘general’ concepts of motivation may not be applicable to different topics, i.e. language and global mindset.

Paper No KK_048

Paper Title THE EFFECTIVENESS OF TEACHER TRAINING LESSONS

Email Address [email protected]

1st Author Burhanuddin Tola

Affiliation State University of Jakarta

Subsequent authors

1. Aims/ Objectives of study:

To provide information/baseline data on the teacher training that conducted by DGDETEP, TPG/TSMD and the effectiveness of itself. This relates to BERMUTU program that will help training that organized by TPG/TSMD.

2. Sample: The populations of this study were teachers of JHS in the study field, Math, IPA, Indonesian Language, and English language, who followed the training which conducted by (DGDETEP) math, IPA, and Language, also TPG/TSMD.

3. Method: In this study conducted the measurement to the teacher competence, even in the teaching competence (academic competence) or non-academic competence which consisted by personality, social competence, pedagogic competence, also it was saw the effect of the training to the satisfaction of work, the teacher attitude while teaching and teaching efficacy. The measurement in this study was tested in the teacher of JHS in various regional in Indonesia and improved together, by team of the researcher of state university of Yogjakarta, Universitas Pendidikan Ganesha, Universitas Negeri Makassar dan Universitas Indonesia, the measurement was improved by the content validity test and coefisien alpha (α) was counted to see the reliability. Beside the measurement as already mentioned, the questionnaier was distributed and studied qualitatively to see the respont of the training participant toward the training which already conducted, especially, related to the benefits matter which can got from the training and some suggestions which they given for the training. To test the effectiveness of this study, so the study was conducted through experimental design as: pre-test and post-test group design. The measurement of the effectiveness of the training was conducted two times with giving the questionnaire before and after the training. And the model of the training

Page 39: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

39

explained as follow:The data which got depend on the conducting of the training which existed. Moreover, the writer could not make the permanent schedule, because of the schedule of the training was decided by TPG/TSMDor DGDETEP. The data collecting was conducted before training started and after training done. This matter aimed in order pre-test and post-test can took purely. The statistic analysis which used to test the effectiveness of the training was t-test independent sample with testing the differences of gain score. The differences of the averages score between before and after training. Besides that, also measured the pre-test score between group of TPG/MGMP and DGDETEP. The counting of the data was counted by SPSS 11th version.

4. Results: There was a significance difference in the competence of profession of Math on the teachers before and after participated the training, even on the teachers in TPG/TSMD training or DGDETEP training. Moreover, there was the improvement on the teachers’ competence after received the training program. it can be looked at the differences of pre-test and post-test score. From the previous table it also can be saw that there was the significance difference in the profession competence of math between teachers who followed TPG/MGMP training and DGDETEP, where the teachers who followed TPG/MGMP training had higher competence than teachers who followed DGDETEP training (the comparison of the pre-test and comparison of the post-test between TPG/MGMP and DGDETEP). Competence of profession of teachers’ biology of DGDETEP were bit high of average (pre-test score). The score of training teachers of TPG/MGMP can be processed it maybe there was an error in filling section so pre-test score between teachers who followed TPG/MGMP training and P4TL of Jakarta cannot be compared. There are significant differences in the professional competence of teachers of physics between before and after training on the TPG/TSMD. However, it cannot be simply interpreted that there is significance difference from both the score because the professional competence decrease after training. It could be caused by a mistake or lack of time in answering the questions in the posttest. In addition, there is a significance difference between teacherphysicscompetence who trained by TPG/TSMD and DGDETEP. Whereas the teachers that training in DGDETEP the competence scores is significantly higher than teachers who are training in TPG/TSMD. Teachers have an average competence. From the results of statistical tests, it appears that there is a reduction score in Indonesian Language subjects, but the

reduction is not significantly. The possibility of this chase is caused by the narrow training time which is two days and the training content that does not provide enough material to equip teachers in academic terms. The same result could also find in English which the data from the training TPG/TSMD for English training managed by DGDETEP not held/canceled. The competence of teachers is above the average score. This personal competence is highlights the teachers’ characteristics such as solid, honest, confident and responsibility for the teacher ethic. The social competence of teachers classify as above average score. The social competence statements in the questionnaire relates to the ability to develop social relationships with the fellow teachers, parents, and students. From the results of the statistical analysis, there is a significant differences in social competence among teachers who attends the training TPG/TSMD and DGDETEP, where teacher training DGDETEP have social competence higher than teacher training TPG/TSMD does. The pedagogical competence of teachers classified as above average score. The pedagogical competence statements in the questionnaire related to the ability to prepare and organize the learning in the classroom. The result that is obtained from the pedagogical competence is inconsistencies.It is shown in the group of teachers who attended the training TPG/TSMD. In the TPG/TSMD teacher training, the score in the pedagogical competence of teachers increase and the teacher who follow DGDETEP group decrease. If we analyze the material that provided on the training, the TPG/TSMD training the teacher is taught to master the preparation of classroom, such as how to create a syllabus KTSP. While on DGDETEP training, it emphasismore on teaching material.

5. Conclusions: From the research toward the training which organized by TPG/MGMP and DGDETEP, can be concluded as follow. The training of TPG/MGMP, there was the significance improvement on: Academic competence of Math, Personality competence, Teaching efficacy. On the training which conducted by DGDETEP, happened on: Academic competence of Math, The attitude in teaching, Teaching efficacy. The decreasing of the score happened in some aspects and only happened in the training which conducted by DGDETEP. On aspects: Academic competence of physics, Personality competence, Social competence, Pedagogic competence. Whereas, from the data baseline when pre-test on both groups, there were significance differences in some aspects, such as: in Math, Physics, Personality Competence, Social Competence, Satisfaction of Work, Teaching Attitude, and Teaching efficacy.

Page 40: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

40

Paper No KK_049

Paper Title Psychometrics Properties of the Tuckman Procrastination Scale in an Indonesian sample

Email Address [email protected]

1st Author Ngadiman Djaja

Affiliation Krida Wacana Christian University (UKRIDA)

Subsequent authors

1. Aims/ Objectives of study:

This research aimed to establish how well items from Tuckman Procrastination Scale fit a Rasch Measurement Model. The analysis aimed to identify unreliable items and items displaying poor fit to procrastination construct.

2. Sample: A sample of 47 participants were recruited via email and social media (facebook), which comprises a cohort of men and women aged 21-57 years from the population of Jakarta, Indonesia in 2017.

3. Method: All 47 participants completed the 16 items measuring procrastination; only items related to procrastination were used in this study. Using Rating Scale Model (RSM), the item and overall test parameters and ability parameters of participants were estimated. Acceptable mean square statistics (MNSQ) parameters were defined as 0.6 and 1.4. Items with MNSQ outside this range are considered to under-fit or over-fit with the model.

4. Results: Item calibration using RSM showed 13 items had acceptable mean square statistics ranging from 0.62 to 1.34; three items were removed for further analysis. Item measures for the 13 items ranged from theta -1.38 to 0.94. Only 13 items fitted the single construct, implying that the Tuckman Procrastination Scale is a unidimensional measure of procrastination. Overall, the scale has good psychometric properties with a person reliability of 0.82.

5. Conclusions: This analysis provided evidence for the Rasch measurement qualities of the Indonesian version of the Tuckman Procrastination Scale

Paper No KK_050

Paper Title Live Grading of Essay Questions Contributing to Computer Adaptive Testing

Email Address [email protected]

1st Author Dr. Haniza Yon

Affiliation MIMOS Berhad

Subsequent authors

Rense Lange (ISLA), Norsyahida Abd Kadir (MIMOS Berhad) & Nur Ayu Johar (MIMOS Berhad)

1. Aims/ Objectives of study:

Design and implement a Computer Adaptive Testing (CAT) that integrates multiple-choice (MC) and more complex question types requiring considerable real-time analysis to evaluate test-takers’ answers. Examples include test-enhanced learning, hybrid questions, and the grading of student written answers to essay questions. We outline the design of CAT using “open-ended” (OE) essay questions that are “scored” by an extension of the CAT system in real-time using item calibrations obtained for a third grade reading test. Using a Rasch partial credit model, the essay score is used to estimate test-takers’ performance on a reading test.

2. Sample: Items were calibrated based on sample of 534 Malaysian third-graders who took a test MC + OE designed to assess their comprehension of a reading passage. Five teachers graded each answer to the OE.

3. Method: The OE questions were analyzed using Latent Semantic Analysis (LSA) using term weighting (information-based TF-IDF) followed by Singular Value Decomposition (SVD) to obtain a semantic space in which to locate student answers. The MC items were analyzed using the standard binary Rasch model to derive student reading measures. We found that a rather low number of dimensions (50-100) provided the best results in predicting students' reading measures. It proved possible to predict students’ OE using linear regression, logistic regression, and discriminant analyses. By ranges of predicted values, OE answers could now be treated as if they were ratings on a rating scale. Results obtained via LSA and those obtained from teacher ratings correlated highly.

4. Results: The simplest approach, linear regression, proved adequate to identify 3-5 ordered categories to represent student performance on the OE question (R2 > 0.7). After anchoring the MC, the OE was then calibrated as a partial credit item

Page 41: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

41

to yield an augmented set of CAT items. Using the Rasch-based CAT system described in Lange (2007) which allows for partial credit items, it is now possible to mix and match MC and OE questions in the same CAT system. In this system, OE answers are (1) “graded” using the LSA approach, (2) this grade is treated analogous to the “grade” a teacher might have given, (3) given the OE item’s difficulty and step values it is thus possible to treat OE and MC in the exact same way to guide the CAT system.

5. Conclusions: We have designed and implemented an augmented CAT system in which MC and OE items can be mixed as desired. Various OE grading methods can be used, and this would also accommodate other types of items that produce student behavior requiring further evaluation to yield a “grade.” We have implemented a prototype system that we hope to demonstrate live during the talk at the conference.

Paper No KK_051

Paper Title DEVELOPMENT OF INDONESIA SCIENCE LITERACY TEST (ISLT) INSTRUMENTS TO IMPROVE CRITERIA VALIDITY OF NATIONAL EXAM

Email Address [email protected]

1st Author Rosita Uli Sihombing

Affiliation

Subsequent authors

1. Aims/ Objectives of study:

This study aims to develop the Indonesian Science Literacy Test (ISLT) instrument that can measure the science literacy of 15 year old students (grade 9 or 10), especially in an effort to increase the criteria validity of national exam.

2. Sample:

3. Method: The methods used through 5 stages of the ADDIE development model are: (1) analyse; (2) design; (3) development; (4) implementation; (5) evaluate. Rasch model is used to get the good item for ISLT.

4. Results: The results of the validity and reliability test indicate that the developed ISLT instrument

5. Conclusions: ISLT instrument belongs to a valid category and deserves to be used as an instrument to measure students' literacy skills.

Page 42: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

42

Paper No KK_052

Paper Title Rasch Model Application on Developing a Self-regulation Study Instrument for Mathematics Education Students

Email Address [email protected]

1st Author Wardani Rahayu

Affiliation Universitas Negeri Jakarta

Subsequent authors

1. Aims/ Objectives of study:

To develop a Mathematics Education students’ self learning instrument using a Rasch model on item response theory

2. Sample: The sample in the first trial comprised 249 Mathematics Education students whilst in the second trial there were 260 Mathematics Education students in Jakarta and Tangerang.

3. Method: This is development research with two trials. The analysis of results using a Rasch model on the first and second trial is that the Mathematics Education students’ self learning instrument has a significant item reliability and respondent reliability; the item’s unidimension requirement and local independence are fulfilled as an assumption on the item response theory

4. Results: The Mathematics Education students’ standard model instrument consists of 11 items that measure cognitive aspects, 7 items that measure motivation, 8 items that measure behaviour, and 4 items that meassure meta-cognitive aspects.

5. Conclusions: The Mathematics Education students’ self-learning instrument that has been developed, can be used to measure students' motivation, cognitive dimension, behaviour, and meta-cognitive dimension.

Paper No KK_053

Paper Title Measuring Second Language Receptive Knowledge of Collocation Among Graduate Learners in Public Universities Malaysia Using Rasch Analysis

Email Address [email protected]

1st Author Lily Hanefarezan Asbulah

Affiliation Fakulti Pendidikan, Universiti Kebangsaan Malaysia

Subsequent authors

Maimun Aqsha Lubis. Fakulti Pendidikan, Universiti Kebangsaan Malaysia

1. Aims/ Objectives of study:

This study investigated AFL graduate learners knowledge of verb-particle, noun-noun and noun-adjectives collocations at the first four 1000 word frequency levels. A 40-item collocation test was used to measure receptive knowledge of verb-noun and adjective-noun collocations that are made up of words taken from 1000, 2000, 3000, and 4000-word frequency levels in Arabic.

2. Sample: Since the data were collected at a single duration of time, this study employed a cross-sectional sample survey field study. By adopting non-probability sampling techniques, a total of 345 graduate learners from Bachelor Degree of Arabic Language were involved in this study from seven public universities in Malaysia which were UKM, UPM, UM, UiTM, UIAM, USIM, UPSI and UniSZA.

3. Method: A 40-item multiple choice question test of verb-noun and adjective-noun collocations was given to the participants. There were three options of distractor (collocates) that the learners had to choose from; “I Don’t know” ( أعرف ال أنا ) was also offered as the fourth option to prevent respondents from simply guessing a matching collocate.

4. Results: Rasch analysis shows that all items on a test are reliable and unidimensional, that the participants’ responses fit the parameters of the Rasch model and there is no item bias (differentiated item functioning) occurring.

5. Conclusions: Surprisingly, the result revealed after at least 9 years of formal language instruction, the respondents were close to a level mastery of collocational knowledge for category noun-noun and noun-adjective but not for verb-particle.

Page 43: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

43

Paper No KK_054

Paper Title Development and validation of a diagnostic pronunciation rating scale: A rating scale and common-item equating analysis

Email Address [email protected]

1st Author Yuanyue Hao

Affiliation Fudan University

Subsequent authors

1. Aims/ Objectives of study:

Assessment of pronunciation has long been established as an integral component of speaking assessment, usually combined with other dimensions such as fluency, lexical resource and topic development to generate an overall score for the speaking section in major English tests. Few studies focus on assessment of pronunciation per se, which plays a critical role in pedagogical context such as pre-service teacher training and teaching assistant selection. This study attempts to develop a diagnostic rating scale of pronunciation for the purpose of pronunciation instruction in a formal pedagogical practice and provide evidence for the construct validity of the scale by a many-facets Rasch analysis.

2. Sample: Participants of this study were selected from the students who were receiving both formal pronunciation instruction in the course of English Pronunciation and the subsequent peer tutoring in a major university in China which is specialized in and renowned for pre-service teacher training. A total of 88 students were rated by 10 raters who were also the peer tutors and 2 raters who were instructors of the pronunciation course on 23 items in the diagnostic rating scale.

3. Method: 23 items in the rating scale were developed in a theoretically-informed and empirically-based fashion. Theories from the linguistic studies of phonetics and phonology inform the design of the rating scale. In the meantime, tutoring records of each student were collected, coded and analyzed to supplement the rating dimensions extracted from phonetic and phonological theories. 23 items were compiled into the final version of the diagnostic rating scale. The scale was analyzed by Minifac (version 3.80.0) in the model of rating scale to investigate the construct of the scale. In a subsequent stage, ratings of two groups of students who were enrolled in different majors in the Department of English were analyzed by Ministep (version 3.93.2) to conduct a common-item equating

analysis to explore the validity of rating scale across different groups of students.

4. Results: Initial construct validation by Minifac reveals satisfactory quality of the scale construction, indicated by all the Infit and Outfit mean-square ranging between 0.5 and 1.5, though further analysis of the use of category suggests that category collapse be necessary. Results from the common-item equating analysis corroborate the validity of this diagnostic rating scale of pronunciation, considering that the empirical line is parallel to the identity line. Findings from the two analyses have provided preliminary evidence for the validity of the rating scale, which is subject to further investigation.

5. Conclusions: This study employs a rating scale Rasch analysis and common-item equating analysis to investigate the construct validity of a diagnostic rating scale of pronunciation and its validity across different groups of students. Results suggest that this scale can be considered a valid one in assessing pronunciation and diagnosing students’ difficulty in the process of learning. This study has implications for the development and validation of diagnostic rating scale of other sub-dimensions of language proficiency and pronunciation teaching and assessment. With that being said, many questions remain unsolved, such as the contention between nativeness-like principle and intelligibility principle in the practice of teaching pronunciation, especially against the backdrop of global Englishes, longitudinal study of the validity of this rating scale, the learning trajectory of students over the one-year pronunciation learning experience, and the linkage between the diagnosis and instructional resources.

Page 44: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

44

Paper No KK_055

Paper Title Development of instrument in measuring cottage industry accounting practices.

Email Address [email protected]

1st Author Susana Narawi

Affiliation Faculty of Accountancy, Universiti Teknologi MARA, Sarawak

Subsequent authors

Bambang Sumintono, Institute of Educational Leadership, University Malaya, Kuala Lumpur.

1. Aims/ Objectives of study:

The main objective of this paper was to discuss issues pertinent to the development of instrument to measure cottage industry accounting practices. Keywords: instrument, accounting practices, Partial Credit Rasch Model.

2. Sample: Based on preliminary investigation, purposive sampling technique was utilised. The sample was obtained through the assistance of an agency which provides micro credit facility. Pilot and actual data collection involved 31 and 117 cottage industry owner respectively which had obtained micro credit facility.

3. Method: Initial stage involved several preliminary interviews with cottage industry owners and subsequently follow with expert content validity. The next stage involved face to face pretesting with 11 respondents and also pilot test with 31 respondents after the pretest data was analysed using WINSTEPS software. Personally by group administration questionnaires was used for final data collection. Fives cottage industry owners were concurrently interviewed to obtain qualitative responses to justify the quantitative findings. This research had used the Partial Credit Rasch Model as it involved polytomous data.

4. Results: Findings from the data collection clearly show that there were indeed vital to have the non-applicable in the measuring scale for the level of accounting practices among the cottage industry. Analysis result of this study had shown 15% of this non-applicable responses as missing data. In the development instrument for this study from the pilot test instrument to the final data collection instrument, all the result on the condition to fit the model were satisfied and improved.

5. Conclusions: The result of this study reveals that the Partial Credit Rasch Model is capable to handle the presence of non-applicable accounting practices and at the same time is able to satisfy all the conditions to fit the model although it involves polytomous data. As a conclusion, it is expected that discussion from this paper will add new knowledge on the importance of instrument quality to be examined using Rasch Measurement Model or Partial Credit Rasch Model particularly to future study involving accounting practices.

Paper No KK_056

Paper Title Comparison of holistic and analytic rating methods of a writing task from the perspective of validity, reliability and practicality

Email Address [email protected]

1st Author Keita Nakamura

Affiliation Eiken Foundation of Japan

Subsequent authors

1. Aims/ Objectives of study:

In language testing, the debate between holistic and analytic scoring of writing tasks has been long and well-documented (Zhang et al, 2015). Holistic scales respond to language performance as a whole, and each score on a scale represents an overall impression, while analytic scales are composed of separate aspects of performance and each aspect is scored separately (Li et al, 2015). As previous literature has shown little consensus on which method yield more reliable and valid result (Harsch et al, 2013), it has been argued that the purpose of the writing task, whether diagnosis, selection, or achievement is significant in deciding which method is chosen (Bacha, 2001). In Weigle (2002) it has shown that from the reliability perspective, analytic scale is better than holistic scale. From the practicality perspective, however, analytic scale is more time consuming and expensive than holistic scale. As for the validity perspective, holistic scale assumes that different aspects of the writing ability develop at the same rate, while analytic scales assumes those aspects develop at different rates. However, there have been only a few studies which investigated the all three perspectives of reliability, validity and practicality when comparing the holistic and analytic scales.

2. Sample: In this paper, the author presents the result of a study in which 371 Grade 8 and 204 Grade 7 students took a test for a grade-appropriate EFL writing task. Nine trained raters rated the papers using both holistic and analytic scales by counterbalancing the rating order effect.

3. Method: Analytic scale contained three criteria, namely content, vocabulary and grammar. Each rater was asked to at first rate a set of 20 common anchor papers for both Grade 7 and 8 tasks. At the same time, each rater was asked to measure their

Page 45: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

45

rating time for each paper.

4. Results: Using the many-facet Rasch model analysis (Linacre, 2015) for both Grade 7 and 8 tasks separately, the results have shown that the analytic scale showed the higher reliability when compared to the holistic scale for both tasks. It was also found that even the same rater showed variability of his rater severity across the tasks and scales. In addition, raters’ fit indices also varied within individual raters across the tasks and scales. The estimated participants’ proficiencies from the two scales showed a high correlation (r= .97) for both tasks. Finally, it was found that the rating time for the analytic scale took as twice as the holistic scale for both tasks.

5. Conclusions: The implications would be discussed in terms of the validity, reliability and practicality perspectives for choosing the appropriate rating scale. The author would argue that the purpose of the test or the validity issue should always come first when designing a test and also when choosing a rating scale, and the reliability and practicality should sometimes be compromised. However, the degree of compromises should always be investigated by empirical studies before making decisions.

Paper No KK_057

Paper Title Analysing The Effect of Smart Partnership using Rasch – a case of women entrepreneurs in Tanjung Karang

Email Address [email protected]

1st Author Rohani Mohd

Affiliation UITM

Subsequent authors

Salwana Hassan, Geetha Subramaniam, and Badrul Hisham

1. Aims/ Objectives of study:

The purpose of the paper is to investigate the impact of collaboration between women micro-entrepreneurs and major retailers

2. Sample: The purposive sampling technique was used for this study. All 17 women micro entrepreneurs participated in the Smart Partnership with Mydin Hypermarket were selected. The data was obtained via self administered questionnaire using an adapted quality of life instrument.

3. Method: A case study was conducted upon women entrepreneurs in Tanjung Karang, Selangor who participated in a collaborative program called Smart Partnership with Mydin Hypermarket. Rasch analysis was conducted to answer the research objectives.

4. Results: Based on the analysis, there were 5 groups of micro-entrepreneurs, identified by the 5 strata scored. Two rulers were generated from the analysis. They are profiling and gaps. The ruler of profiling identifies 5 groups based on their level of satisfaction for different aspects of quality of life. The ruler on effectiveness measures the effective levels of the program depending on the size of gap between items before and after the program.

5. Conclusions: The findings indicated the collaborative program has successfully transformed the poor to a better (quality) life. The discussion and recommendations were also included in the paper.

Page 46: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

46

Paper No KK_058

Paper Title Development And Validation Of Malaysian Secondary School

Email Address [email protected]

1st Author Ma Chi Nan

Affiliation UMS

Subsequent authors

Vincent Pang

1. Aims/ Objectives of study:

Purpose: The purpose of this study is to develop and evaluate an instrument that measures Malaysian students’ national identity. Objectives: (a) To develop an instrument to measure Malaysian students’ national identity. (b) To assess the validity of the Malaysian students’ national identity instrument. (c) To assess the reliability of the Malaysian students’ national identity instrument.

2. Sample: Population: All secondary school students in Penampang District, Sabah Sampling Method: Stratified sampling (There are six secondary schools in Penampang, 105 students from each school were selected as sample, where 35 students from Form one, 35 students from Form 2 and 35 students from Form 4. Sampling Size: 630 students

3. Method: Research Procedure: Stage 1: Instrument Development (a) Developing conceptual and operational definitions of the construct of student’s national identity (b) Generating an item pool for the instrument (c) Determining the format or selecting a scaling technique for the measurement. Stage 2: Instrument Testing & Refining (a) Establishing content validity of the instrument (b) Performing back to back translation (c) Preparing a revised draft of the questionnaire (d) Testing construct validity and reliability

4. Results: 1) Item Misfit Diagnosis: Two items from National Heritage construct, one item from Cultural Homogeneity construct, three items from Emotional Attachment to Malaysian Nation construct, and two items from Collective Self-Esteem construct were having the infit and outfit mean square outside the accepted range. The Bright Maps were checked, and the outliers of items were identified from Guttman scalogram of responses, in order to determine whether those items were suggested to be retained, revised or omitted. 2) Item Polarity Diagnosis: One item from Collective Self-Esteem construct was having the negative value of point-measure correlation. 3) Principal Component Analysis of Rasch Residual (PCAR): Collective Self-Esteem construct was having the value of raw variance explained by measures lower than 40%. Standardized Residual Constrast 1 Plot was checked to identify the cause. From the item misfit diagnosis and PCAR diagnosis, the negatively worded items EA43, CS68 and CS74 were suggested to be omitted. Besides, the negative item EA58 was suggested to be remained and changed to positively worded item. 4) Separation Diagnosis: All the items were having the item separation greater than 3.0 and item reliability greater than 0.90. 5) Category Function Diagnosis: Belief System construct, Emotional Attachment to Malaysian Nation construct, and Nationalism construct were having step disordering. These constructs were suggested to collapse the category 1 to category 0, which means researcher reduced the categories from a 5-point rating scale to a 4-point rating scale.

5. Conclusions: Three items (EA43, CS68 and CS74) were negatively worded, and most properly the respondents were confused with the items. After discussion with content experts and refer to the misfit of negative items, the researcher decided to drop the three negative items. The deletion of the item caused the gap between the variance achieved by measures and Rasch Model to be closer. Therefore, the total number of items in the inventory was reduced to 73 items (from 76 items).

Page 47: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

47

Paper No KK_059

Paper Title The Unreasonable Effectiveness of Theory Based Instrument Calibration in the Natural Sciences: What Can the Behavioral Sciences Learn?

Email Address [email protected]

1st Author Jackson Stenner

Affiliation Chief Scientist, MetaMetrics, Inc., Durham, North Carolina, USA

Subsequent authors

Mark Stone, William Fisher

1. Aims/ Objectives of study:

Abstract. In his classic paper entitled “The Unreasonable Effectiveness of Mathematics in the Natural Sciences” Eugene Wigner addresses the question of why the language of Mathematics should prove so remarkably effective in the physical [natural] sciences. He marvels that “the enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and that there is no rational explanation for it” [1]. We have been similarly struck by the outsized benefits that theory based instrument calibrations convey on the natural sciences, in contrast, with the almost universal practice in the social sciences of using data to calibrate instrumentation. 1. Introduction In our ongoing exploration of the differences between the way the natural sciences and social sciences invoke, define and engage in measurement we have identified a number of differences. We have, to some benefit, contrasted human temperature thermometry (e.s. Nextemp thermometers) with the testing of mathematical ability and the measurement of English language reading ability. Although cataloging these differences has been useful, we now believe they are all traceable to a common cause. Physical science measurement virtually without exception is founded on well-developed substantive theory. These theories are not just compelling stories about the relationships between measurement outcomes (count of cavities turning black on a Nextemp thermometer) measures (degrees Celsius) and measurement mechanisms (chemical specification equation ). They are sufficiently elaborated and precise in their specifications that they can be used to calibrate instrumentation. In contrast, throughout the behavioral and

social sciences instrument calibration depends on data and is typically devoid of theory. We hypothesize that most of the observed differences between behavioral and physical science measurement are traceable to this foundational difference. Further, we offer the Lexile Framework for Reading as an example of a theory referenced measurement system in the educational sciences that mimics key features of human thermometry. Finally, we review the affordances shared by human thermometry and the Lexile Framework for Reading. 2. A Reading Framework A consensus unit is typical of most natural science measurement. Sometimes, as in temperature measurement, the unification process is not fully completed but for the vast majority of natural science attributes/constructs a unification process has resulted in diverse instrument makers sharing a unit of measure even when the measurement mechanisms vary from manufacturer to manufacturer. Mercury in a glass tube thermometers for human temperature measurement can be contrasted with Nextemp™ technology. Although the measurement mechanisms are drastically different they both report out in either Fahrenheit or Celsius units. In the case of Nextemp™ thermometry a chemical specification equation calibrates the instrument in ⁰C or ⁰F. The chemical specification equation enforces the unit. In the Lexile Framework for Reading a text complexity specification equation enforces the ‘Lexile’ unit and ensures that 100L of difference between two readers, two texts or a reader/text encounter is invariant over any of 100+ English reading tests that, at present, employ the Lexile unit. Strickly parallel instruments are typical in the natural sciences. Such instruments share a common correspondence table that links a measurement outcome (count of cavities turning black on a Nextemp thermometer) to a ⁰C or ⁰F. The ability to manufacture essentially identical instruments in large quantities is a hallmark of natural science measurement. The specification equation is the recipe for manufacturing and calibrating clones of an instrument. The social sciences borrow the concept and talk about ‘parallel’ instruments or ‘alternate forms’ and advertise that say, form A and B produce exchangeable measures. Of course, without a specification equation it is impossible to manufacture copies or clones that share the same correspondence table. The Lexile Framework for Reading and its specification equation can be used to build strickly parallel clones of any reading test. No such capability exists, for example, for the Quantile Framework for mathematics, and this is so precisely because, at present, there exists no specification equation for mathematical ability that can calibrate mathematics test items. Different mathematics tests are empirically linked to the Quantile

Page 48: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

48

scale through large scale, expensive, field studies typically involving thousands of students. Typical Rasch model applications in the social sciences are singly prescriptive. The major prescription that data must meet is non-intersecting item characteristic curves (ICC’s) which relate probability of a correct response to the difference between person ability and item difficulty. The data are used to estimate person and item parameters with no a priori constraints on the item parameters. The Quantile Framework for mathematics is typical of much social science measurement. Because there is no strong substantive theory for ‘mathematical ability’ there is no specification equation and, thus, no potential for theoretically calibrating items/instruments. Instrument calibrations depend on sample data and a property of the Rasch model: when data fit the model differences between persons and differences between items are independent of items and persons, respectively. Contrast this singly prescriptive measurement framework with the doubly prescriptive models underlying Nextemp™ human thermometry and the Lexile framework for reading. In both these cases strong substantive theory coupled with either a Guttman model or a causal Rasch model requires not just data fit to the model but also data fit to the theory specified item/instrument calibrations. For Nextemp a chemical specification equation is used as a recipe for the chemical compound that fills each cavity. By precisely varying the amount of additive the difference between any two adjacent cavities in sensitivity to the green component of light is precisely .2 degrees Fahrenheit. The chemical specification equation enforces this common unit difference for each of the 44 adjacent cavity differences across the 9⁰F operating range for the instrument. When data fit a doubly prescriptive Rasch model absolute person measures (not merely differences) are independent of items and instruments and are independent of person sample precisely because no person data figures in the instrument calibration process. Theory calibrated Rasch models are, thus, doubly prescriptive: prescriptive as to Rasch model requirements and prescriptive as to the substantive theory i.e. item/instrument calibrations. Person misfit to a doubly prescriptive model signals that the measurement mechanism that transmits variation in the attribute to the measurement outcome (often a count) is not working as intended for that individual. Frequent failures of theoretical invariance forces reexamination of the substantive theory, the measurement mechanism and instrument calibration procedures. Theoretical invariance can be tested within person over time (e.g. reading ability growth trajectories) and when intra individual theoretical invariance holds across persons then inter-individual

theoretical invariance necessarily holds i.e. the attribute is homologous (Borsboom, also Molenaar). Molenaar (has) shown that inferences moving in the reverse direction, interring from inter-individual factor structures something about intra-individual factor structures, is fraught with complications. The fact that so much of social and psychological measurement is based upon factor analysis of inter-individual variation prompted Molenaar to call for a Kuhnian revolution. This paper is intended as a contribution to this revolution. Unification of measurement refers to a 200 year old process whereby dozens if not hundreds of distinct scales for measuring a common attribute are, sometimes quickly and more often slowly, reduced to one, two or three exchangeable units of measure. The history of temperature measurement is a paradigmatic case (Chang, Sherry) that parallels many contemporary measurement movements is the social and behavioral sciences. Typically, an attribute (construct captures the imagination of a community of scholars and engineers and different tests / instruments / mechanisms and scales are proposed for measuring the attribute and each is uniquely names. Once there is consensus that the selfsame attribute is being measured across these various devices small scale linking studies are undertaken to build conversion tables to re express one unit in one or more other units. More advanced linking studies reduce the link to an equation ⁰F = ⁰C * 9/5 +32 or making for quick and easy conversions. Since at this stage there is often not much to elevate one scale about the competition the market place takes over and ‘unification’, with all its time and cost savings eventually prevails. Sometimes unification is swift and decisive but more often, particularly in the social sciences, metrology is poorly understood and unification plods along. A useful case study of unification in the social sciences is the Lexile Framework for Reading which has linked 100+ English language reading tests across the world, 250,000 book measures and 200 million article measures to the Lexile scale. The unification process is 27 years old and is accelerating but is far from complete. This effort drew inspiration and strategies from the unification of temperature (Chang, Sherry, Fisher, Stenner 2016). Rather than using factor analysis of inter individual data to define an attribute structure and then asking if this structure obtains when examining intra individual data we suggest the use of substantive theory (in the form of specification / calibration equations) to establish the universality of attribute structure and measurement mechanism at the individual level. Once this is accomplished there is no puzzle about whether between person differences have the same structure as within person differences – of course they do. So, what this analysis reveals is

Page 49: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

49

that it is problematic to study between person variation at one point in time to glimpse truths about within person structures over time. But the surprise is that if we start with within person theory referenced measurement, where in the extreme no two persons have any items in common over 5 years of measurement then we would not stop for a moment to puzzle about the validity of the claim that at the end of year 1 Jane was higher than Bob but at the end of year 5 Bob was higher than Jane (i.e. a claim about inter individual variation.) This is yet another benefit of theory based instrument calibration. Note 1 “The NexTemp Thermometer is a thin, flexible, paddle-shaped plastic strip containing multiple cavities. In the Fahrenheit version, the 45 cavities are arranged in a double matrix at the functioning end of the unit. The columns are spaced 0.2⁰F intervals covering the range of 96⁰F to 104.8⁰F….Each cavity contains a chemical composition comprised of three cholesteric liquid crystal compounds and a varying concentration of a soluble additive. These chemical compositions have discrete and repeatable change-of-state temperatures consistent with an empirically established formula to produce a series of change-of-state temperatures consistent with the indicated temperature points on the device. The chemicals are fully encapsulated by a clear polymeric film, which allows observation of the physical change but prevents any user contact with the chemicals. When the thermometer is placed in an environment within its measure range, such as 98.6⁰F (37.0⁰C), the chemicals in all of the cavities up to and including 98.6⁰F (37.0⁰C) change from a liquid crystal to an isotropic clear liquid state. This change of state is accompanied by an optical change that is easily viewed by a user. The green component of white light is reflected from the liquid crystal state but is transmitted through the isotropic liquid state and absorbed by the black background. As a result, those cavities containing compositions with threshold temperatures up to and including 98.6⁰F (37.0⁰C) appear black, whereas those with transition temperatures of 98.6⁰F (37.0⁰C) and higher continue to appear green” (Medical Indicators, 2006, PP.1-2). Thus, the observed outcome is a count of cavities turned black. The measurement mechanism is an encased chemical compound that includes a varying soluble agent that changes optical properties according to changes in temperature. Amount of soluble agent can be traded off for change in human temperature to hold number of black cavities constant. Note 2 The Edsphere™ technology for measuring English language reading ability

employs computer generated, four-option, multiple choice cloze items built on the fly for any prose text. Counts correct on these items are converted into Lexile measures via an applicable Rasch model. Individual cloze items are one off and disposable; an item is used only once. The cloze and foil selection protocol ensures that the correct answer (cloze) and incorrect answers (foils) match the vocabulary demands of the target text. The Lexile text complexity measure and the excepted spread of the cloze items are given by a proprietary text theory and associated equations. Thus, the observed outcome is a count of correct answers. The measurement mechanism is a text with a specified Lexile text complexity and an item generation protocol consistent with that text complexity measure. The text complexity measure can be traded off for a change in reading ability to hold constant the number of items answered correctly. Note 3 The Quantile Framework® consists of a common supplemental metric – the Quantile – that is employed to scientifically measure a student’s ability to think mathematically and locate them in a taxonomy of mathematical skills, concepts, and applications. In order to develop the Quantile Framework, several tasks were undertaken: (1) develop a structure of mathematics that spans the developmental continuum from first grade content through Algebra I, Geometry, and Algebra II content, (2) develop a bank of items that have been field tested, (3) develop the Quantile scale (multiplier and anchor point) based on the calibrations of the field-test items, (4) validate the measurement of mathematics ability as defined by the Quantile Framework, and (5) link extant tests of mathematical ability to the Quantile scale. The process of scale unification for mathematics ability is well underway. At present the attribute “mathematical ability” in unspecified i.e. there is no specification equation and associated Quantile analyzer that can be used to locate ‘math text’ on the Quantile scale. Rather, data intensive methods are employed to calibrate instrumentation and human intensive qualitative analysis is employed to locate math text (e.g. a chapter on adding fractions with uncommon denominators) on the Quantile® scale. The vast majority of social science attributes are similarly unspecified. By contrasting Nextemp™ Thermometry, the Lexile Framework for Reading and The Quantile Framework for Mathematics we hope to illuminate the chasm of difference between instrumentation that employs strong substantive theory and that which that do not. For the vast majority of measurement systems it is the case “that the difference

Page 50: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

50

between any two points for one individual is qualitatively the same as a corresponding difference between two individuals at one time point” (Borsboom, D., Cramer, A. O. J., Kievit, R. A., Scholten, A. Z., and Franic, S., 2009) that is, the attribute is homologous. The same cannot be said for many measurement systems used in the social sciences. We propose that the routine adoption of theory based instrument calibrations will pave the way for homologous attributes in the social sciences, thus, assuring that the attribute on which I differ from myself over time is the same attribute on which I differ from my brother (Borsboom, 2005). 4. References [1] Valsiner J, Molenaar P C M, Lyra M C D P, Chaudry N (Eds) 2009 Dynamic Process Methodology in the Social and Developmental Sciences (New York: Springer) [2] Fisher W P Jr 2009 Measurement 42 1278-1287 [3] Stenner A J, Fisher W P Jr, Stone M H, Burdick D S 2013 Frontiers in Psychology: Quantitative Psychology and Measurement 4 doi: 10.3389/fpsyg.2013.00536 [4] Fisher W P Jr, Stenner A J 2011 A technology roadmap for intangible assets metrology. International Measurement Confederation (IMEKO) TC1-TC7-TC13 Joint Symposium. Jena, Germany. hhtp://www.db-thueringen.de/servlets/DerivateServlet/Derivate- 24493/ilm1-2011imeko-018.pdf [5] Borsboom D, Dolan C V, 2007 Measurement 5 236-263 [6] Molenaar P C M 2004 Measurement 2 201-218 [7] Hamaker E L, Nesselroade J R, Molenaar P C M 2007 Journal of Research in Personality 41 295-315 [8] Molenaar P C M, Newell K M American Psychological Association. doi: 10.1037/12146- 006 [9] Rasch G 1960 Probabilistic models for some intelligence and attainment tests (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of

Chicago Press, 1980) (Copenhagen, Denmark: Danmarks Paedogogiske Institut) [10] Stenner A J, Burdick H, Sanford E E, Burdick D S 2006 J Appl Meas 7 307-322 [11] Bond T, Fox C 2007 Applying the Rasch model (Mahwah, New Jersey: Lawrence Erlbaum) [12] Engelhard G Jr 2012 Invariant measurement (New York: Routledge Academic) [13] Latour B 1987 Science in action (New York: Cambridge University Press) [14] Latour B 2005 Reassembling the social (Oxford: Oxford University Press) [15] Heilbron J L 1993 Historical Studies in the Physical and Biological Sciences 24 1-337 [16] Nersessian N J 2002 in Essays in the History and Philosophy of Science and Mathematics ed. Malament D (Lasalle, Illinois: Open Court) 129-166 [17] Wright B D 1999 in The New Rules of Measurement: What Every Educator and Psychologist Should Know, ed. Embertson S E and Hershberger S L (Hillsdale, New Jersey, Lawrence Erlbaum Associates) 65-104 [18] Engelhard G 2001 J App Meas 2 1-26 [19] Stenner A J, Smith M 1982 Perceptual and Motor Skills 55 415-426 [20] Stenner A J, Smith M, Burdick D S 1983. J Ed Meas 20 305-316 [21] Williamson G L, Fitzgerald J, Stenner A J 2013 Educational Researcher 42 59-69 [22] Burdick D S, Stone M H, Stenner A J 2006 Rasch Measurement Transactions 20 1059- 1060 [23] Stenner A J, Stone M 2010 J App Meas 11 244-252

Page 51: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

51

Paper No KK_060

Paper Title Facilitator Training Needs in Malaysia Schools

Email Address

[email protected]

1st Author Mohd Kashfi Mohd Jailani

Subsequent authors

Raja Hamizah Raja Harun, Nurulhidayah Sukiman, Siti Sarah Baharom, Ismail Mohamad

1. Aims/ Objectives of study:

Facilitator positions created for guide and consulted the leadership of the school, guide teachers to implement pedagogical interesting, creative and innovative and for ensuring every student to acquire basic literacy (Bahasa Malaysia and English) and Numeracy for pupils in Year 3 . This study aimed to identify the facilitator training needs and how these training needs differed among groups of facilitator.

2. Sample: The instrument was administered on facilitator including 220 School Improvement Partners (SIPartners+), 587 School Improvement Specialist Coaches+ (SISC+) , 377 FasiLINUS in District Education Office/State Education Office throughout Malaysia

3. Method: The overall facilitator training needs in this study was analyzed using Rasch analysis. The research used a self-administered questionnaire on Facilitator Competency Instrument. Analysis on item – person map was carried out to identify the level of difficulties of item- person.

4. Results: The results show that the majority of the facilitator performed well in all skills indicating that they were able to master all the competency as facilitator . However they are still lacking in Needs Analysis. The analysis found that there are some of the most difficult items agreed upon. These items have been grouped into several main focus of research methods, Design Training, Implementation Analysis Needs (need analysis) and Interpersonal courses / Effective Communication.

5. Conclusions:

This study aimed to identify the facilitator training needs and how these training needs differed among groups of facilitator. Suggestion training needs for SIPartners+ were coaching and mentoring course, education research courses and effective communication. Suggestion training for SISC+ were Standard Quality of Malaysia Education (SQME) Standard 4 workshop, workshops on ICT, courses related to education policies and Hands-On workshop to analyze and interpret data SQME Standard 4. Suggestion for FasiLINUS training were workshop on research methods, Effective interpersonal and communication courses, and ICT course.

Paper No KK_061

Paper Title Performance of Early Mathematics Achievement Test (UPAM) over time: Applying Rasch Measurement Racking

Email Address [email protected]

1st Author Dr. Connie Cassy Ompok

Affiliation UMS

Subsequent authors

Dr. Ling Mei-Teng, Prof Vincent Pang,

1. Aims/ Objectives of study:

This study aimed to see what performance indicators have changed over time.

2. Sample: The sample consisted of 170 P1-preschool children

3. Method: Rasch Measurement Racking

4. Results: The instrument showed good Rasch analysis properties (PTMEA Corr. > 0; Infit and outfit mean square >2; Item separation and reliability = 6.55, 0.94; Person separation and reliability = 3.21, 0.91). The results shows generally consistent changes in item difficulties after intervention. All items difficulty logit value reported a decrease at Time 2. The mean of pre-test item difficulty was 1.61 and the mean of post-test item difficulty was -1.61 which shows the difference of 3.22 logits. The effect size of the difference between the post-test and pre-test item difficulty was -1.28, which is considered large.

5. Conclusions: Rack analysis provides information at item level, allowing distinction between which items that have become easier, more difficult or maintained. Measurement of change of difficulty at item level allows the researchers to identify the functioning items for both tests.

Page 52: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

52

Paper No KK_062

Paper Title Modernizing vs Ecologizing Approaches in Measurement

Email Address [email protected]

1st Author William P. Fisher, Jr.

Affiliation BEAR Center, University of California, Berkeley

Subsequent authors

A. Jackson Stenner; MetaMetrics, Inc.

1. Aims/ Objectives of study:

Education, health care, human resource management, social services, and many other areas of life are marked by a kind of schizophrenia (Star & Ruhleder, 1996; Bateson, 1972) that emerges in terms of the dissonance between a caring focus on individual needs for learning and healing, on the one hand, and demands for accountability focused on standards and comparability. Support for the irrevocable concerns with the individual student’s and patient’s spontaneous processes of development and healing stands as an immovable thesis that is increasingly in opposition to the larger social antithesis of an imposed demand for evidence proving the achievement of quality standards. How might new institutional forms of social life resolving the schizophrenic break be formed at a higher level of system complexity? How might those forms of life synthetically integrate the necessary concern for development and healing with a new ecologizing bottom-up approach to accountability and standards that authentically embodies individual uniqueness?

2. Sample: Philosophers have long sought to grasp the multilevel nature of meaning (Star & Ruhleder, 1996; Bateson, 1972). Linguistic communication systems incorporate within-individual processes distinct from, but interacting with, mid-level processes between individuals, and which in turn are distinct from but interacting with high-level group processes. These levels of complexity in communication have informed practical applications in epidemiology (Susser & Susser, 1996) leading to new, productive relationships between clinical medicine and public health efforts (Bizouarn, 2016). Might not a similar kind of productivity be possible in education if we apply similar approaches to developmental, horizontal, and vertical coherence (Gorin & Mislevy, 2013; National Research Council, 2006; Wilson, 2004) issues in educational assessment? This kind of

ecological approach to problems of coherence in educational assessment may be key to understanding learning in each distinct contextualizing niche of the various social environments in which it lives. Learning varies across these levels of complexity in ways that cannot be grasped directly from individual measures. What form might conceivably be taken by communications systems capable of supporting broad-scale efforts at sorting out the sources of distinct classes of effects on learning?

3. Method: Philosophers contrast modern, postmodern, and unmodern conceptions of science as being positivist, antipositivist, and postpositivist (Galison, 1997; Latour, 1990). Philosophically modernist conceptions of science are positivist in the sense of prioritizing a focus on data as the ultimate criterion of objectivity. Postmodernism, in contrast, is sensitized by historical changes in what data count as worthy of attention and so is concerned with the role of theory in making data salient. Unmodern (also known as amodern) postpositivist perspectives (Dewey, 2012; Latour, 1990; Latour, 1993) assert that the debate between modern and postmodern focuses too exclusively on the mutual implication of theory and data, and so will remain unresolved as long as the roles of instruments and knowledge technologies are not taken into account. Instruments encapsulate what is learned from data and what can be explained by theory. Unmodern philosophical perspectives and research in the history of science (Galison, 1997; Latour, 1990; Latour, 1993; Bud & Cozzens, 1992; Wise, 1995; Dear, 2012; O'Connell, 1993) focus on the collective cognition and team-based coordinations made possible when this embodied form is expressed in a uniform language distributed throughout a community of practice. Metrology’s concern with measuring instruments traceable to unit standards then becomes a matter of focal interest as a way in which everyday model-based reasoning has been extended productively into science (Nersessian, 2012). Recent developments suggesting metrological paths forward for the constructs of psychology and the social sciences (Mari & Wilson, 2014; Pendrill & Fisher, 2015; Pendrill, 2014; Wilson et al., 2015; Fisher & Stenner, 2016) also extend everyday model-based reasoning (Fisher, 2004; Fisher, 2010) and open up new possibilities for enhanced innovation in education, health care and other fields. A significant problem that remains unaddressed is how varying levels of information complexity can be integrated into a new metrological culture encompassing all of the arts and sciences.

4. Results: Reading measures are linked together in an ecosystem that has capitalized on the literacy form of life that consistently asserts itself across samples of students,

Page 53: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

53

texts, test items, time, and space (Fisher & Wilson, 2015). Niches in this ecosystem span a wide range of classrooms, schools, homes, libraries, testing agencies, and book publishers. Reading test item difficulties have been shown to be remarkably stable over decades of use (He & Kingsbury, 2016) and moreover can be predicted by an explanatory theory accounting for over 90 percent of the observed variance (Fisher & Stenner, 2016). More than 100 English language reading tests across the world measure in a common unit. Over 30 million student measures annually are interpreted relative to 250,000 book measures and 200 million article measures, where matching student and text measures predict a 75 percent comprehension rate. Books, articles, assessments, and students have been brought into a common frame of reference in a process now over 27 years old and still accelerating. Text complexity corresponds with reading learning progressions such that student measures enable the individualization of instruction. Student measures are tracked over time and across grade levels, instantiating developmental coherence. Teachers are able to compare learning outcomes across their own and each other’s classes, realizing horizontal coherence. And in many locations, state end-of-year or graduation tests report in the common unit, providing parents, students, teachers, principals, librarians, researchers, and the public with the vertical coherence needed for connecting classroom formative assessments with accountability standards.

5. Conclusions: Instead of demanding strict conformity with item-based equating standards, then, it is likely more realistic and productive to think of standards in terms of shared information contextualized in a common theoretical and explanatory frame of reference (Stenner, et al., 2013). Instead of expecting all student measures to be produced from one set of items that fit one measurement model, individual response patterns can be displayed in instructionally relevant and developmentally coherent kidmaps with no need for reporting any scaling or statistics. Horizontally coherent statistical summaries of measures over time, within and across classrooms, will be reported to teachers and administrators in support of the local community of practice, in a unit comparable with summative accountability measures. These informationally coherent links stand as “potential” universals in partially interconnected, resonant, and multilevel traceability network ecosystems, bypassing the strictly local item-based problem-solution dependency and the universal problem-solution independence at the same time (Latour, 2005, p. 229). These ecologized “glocal” media, simultaneously local and global, are

characterized by Ricoeur (Ricoeur, 1992, p. 289) as potential or inchoate universals. Dewey (1954, p. 215) similarly held that "The local is the ultimate universal, and as near an absolute as exists." The interconnections of metrological networks supporting local approximations and translations of standards is pointed to by Golinski (2012, p. 35) as replacing the uniform universality assumed in modern science. And Haraway (1996, pp. 439-440) suggests another account as to how locally embedded relationships offer an alternative to both relativism and transcendence. The sustainability opportunities created within an ecologizing paradigm stem from the co-evolution of (a) concepts embodied in linguistic and measurement technologies and (b) the institutional rules, roles, and responsibilities within multilevel social, political, and economic ecologies (Hutchins, 2014; Miller & O'Leary, 2007). The end results are systems of tools embodying individually unique problem-solution unities that are useful in negotiating local particularities while still recognizable as belonging to an identifiable general class. These results suggest potentially large payoffs of new analogies from existing online engineering models of global cooperation enabling intelligent metrology applications (Durakbasa, Bauer, Bas & Riepl, 2015). Perhaps caring for our measuring technologies in education and other fields the same way we care for our children will yet lead to creation of forms of social life sensitive to the values and experiences of those who inhabit them.

Page 54: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

54

Paper No KK_063

Paper Title Rasch-based Test Equating: An Application of Winsteps in China

Email Address [email protected]

1st Author Wu Jinyu

Affiliation City University of Macau, SAR

Subsequent authors

Zhang Quan, City University of Macau, SAR

1. Aims/ Objectives of study:

In today’s testing practice, equating plays a central role and is held as the prerequisite condition for item banking in computerized as well as in Internet-based testing. Through equating, the changes of item difficulties in the test forms can be observed and the corresponding ability estimates across different occasions are thus adjusted. As equating is a complicated process requiring enormous data processing and manual calculation is by no means feasible and as we have been using our self-developed software Gitest which, though a Rasch-based DOS program, has limit of processing jumble data matrix by a single run. This highly motivates the authors to seek for other effective tool. Now, among various kinds of computer software available for estimating test items, ability parameters and test equating, Winsteps is a great software program to consider. This paper attempts to present the significant aspect of Winsteps: parallel test equating based on a group of minimum yet representative data. It indicates a wide range of application of WINSTEPS to practical test equating problems, assumes binary scoring of item responses and gives stable and accurate estimates of item parameters and scale scores for both long and short tests and classroom exercises.

2. Sample: The results are based on 40 Chinese students of non-English major of a university in Zhongshan, Guangdong Province, China.

3. Method: The method used herein for test equating refers to linking of separate test forms through common (linking) items so that scores derived from the tests which were administered separately to different test takers on different occasions after conversion (in our presentation Rasch analysis referred) will be comparable on the same scale (Hambleton & Swaminathan, 1985; Gui, Li and Zhang, 1989).

4. Results: The key point is that the results based on the same data obtained from Winsteps, compared with those from other software such as Gitest, turn out to be exactly the same

5. Conclusions: This shows us Winsteps is another good choice for both Chinese scholars of language testing to consider.

Page 55: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

55

Paper No KK_064

Paper Title Misconceptions in electricity via Rasch Analysis

Email Address

[email protected]

1st Author Nazlinda Abdullah

Affiliation Universiti Teknologi MARA

Subsequent authors

1. Aims/ Objectives of study:

This study is a preliminary study which focuses in determining the suitability of DIRECT, a test developed in the United States, which covers the topic of electricity, in identifying the misconceptions of Malaysian students. In addition, this study aims to compare the performances of the various groups of Malaysian students.

2. Sample: This study involves 104 Malaysian from various colleges, institutions and school in the Klang Valley

3. Method: This is a preliminary research which uses the descriptive research method. A test on electricity named DIRECT was distributed to various colleges, institutions and school at different period of time.

4. Results: In general, results show that the Person reliability = 0.52 while the item reliability is 0.95. The person mean shows a negative value at -0.30 logit, which means that the students found the test to be challenging for them. The targeting was found to be acceptable with the item spread at 5 logits and the person spread over 3 logits. There is a need to increase the number of students with wider range of ability. From the item frequency measure order table, the misconceptions which the students have on each item and area of electricity were identified. The misconceptions identified were like those found among the American students. The area of potential difference and current were two common areas where the students were having problem in. In addition, Malaysian students also faced challenges in energy and power. As for the performance of each group, the comparison can be detected from the Wright Map by arranging each student according to their groups. Among the six groups of students, the students who were doing the A-levels and pursuing a medical degree happen to be the highly capable students.

5. Conclusions:

The findings show that DIRECT is suitable for the Malaysian students in identifying their misconceptions of electricity. In addition, the performances of each group were easily identified, including the areas of strength and weaknesses.

Paper No KK_065

Paper Title Validating the Usability Evaluation’s Instrument of Community Learning Centre Model (UEICLC) for Aboriginal in Tasik Chini, Pahang

Email Address [email protected]

1st Author Mazzlida Mat Deli

Affiliation Faculty of Education, National University of Malaysia

Subsequent authors

Ruhizan Mohammad Yasin, Siti Mariam Dasman

1. Aims/ Objectives of study:

This study aims to produce an instrument for usability evaluation of Community Learning Centre’s elements Model (UEICLC), which has the reliability and validity using the Rasch model. Keywords: Rasch Measurement Model, evaluation, usability, community learning centre

2. Sample: Sixty community members of Orang Asli in Tasik Chini, Pahang participated in this study.

3. Method: This study employed a quantitative approach of data collection and analysis. A survey was used to gather information on the community Orang Asli’s perception towards the elements of CLC Model, which has involved Jakun’s Orang Asli in Tasik Chini, Pahang. The data were analysed using Winstep 3.80 for investigating the functioning and rating scale categories, reliability and separation index, unidimensionality, item polarity, goodness of fit and item difficulty level of the items.

4. Results: Firstly, the original five-rating scale does not function effectively, scale 1 and 2 should be combined to improve the threshold estimates value between category. Secondly, the reliability for item and person are accepted and the separation are good that are greater than two. Thirdly, the Rasch Model proved that UEICLC is a unidimensional scale which is the raw variance explained by measures were more than 60% and the unexplained variance in 1st contrast are below 15%. Forthly, all the items were fit with the model how ever 6 person were deleted due to misfit. Lastly, the mean of the items were slightly below the person’s ability . The items in this scale are quite easy for the respondents and there also was a big gap within some items

5. Conclusions:

This study produced a new Rasch measurement to evaluate the CLC programs which proved CLC program that will provide an opportunity and space for Aboriginal gained knowledge and skills in line with their beliefs and traditions.

Page 56: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

56

Paper No KK_066

Paper Title Develop, deploy, determine: Surveying assessment for learning in the Singapore secondary school context

Email Address [email protected]

1st Author Christopher C. Deneen

Affiliation National Institute of Education, Nanyang Technological University

Subsequent authors

Gavin Fulmer, University of Iowa

1. Aims/ Objectives of study:

Aim: To develop, utilize and obtain results from a survey instrument on assessment for learning (AfL) in the Singaporean secondary school context. Objectives. To report findings on the: 1. Construction piloting, adjustment and deployment of a survey instrument focusing on AfL perceptions, values, practices and proficiencies. 3. Outcomes of research that may inform the survey target (AfL) and the practice of objective measurement in educational contexts.

2. Sample: Pilot participants (n=163) consisted of Singaporean secondary school teachers enrolled in a master's degree program at the Singaporean National Institute of Education (NIE). NIE is the sole provider of teacher education degree programs in Singapore. The master's degree program is in curriculum and teaching and is not subject specific. This allowed pilot sampling to include responses from secondary school teachers teaching within multiple subjects at a broad cross-section of Singaporean schools. Only responses from teachers working at the secondary school level were included. Thus, pilot sampling was highly congruent with the intended sample/participant group for full survey deployment. Participants in the full survey deployment (n=913, post data cleaning) consisted of teachers at 13 Singaporean secondary schools. School selection was planned as a distributed national representation. Three criteria were used: 1) academic achievement as evidenced through school-level performance on standardized tests 2) socio-economic status of students, and 3) stage that the school was at of Singapore-wide AfL policy implementation. The survey was deployed strategically and in cooperation with ministry officials and school leaders to allow for

responses across disciplines and assure maximum response rate. An average response rate of 80% was achieved.

3. Method: Pilot Development of the pilot survey drew upon a) a review of relevant literature b) the underlying theoretical framework of the research c) careful analysis of similar surveys into AfL and d) a context-specific analysis of Singaporean secondary school assessment and learning culture(s). In the full presentation, this last point will be discussed in some detail, as it is critical to understanding the piloting process. Data from the pilot was subjected to both Rasch (R software with TAM package) and factor analysis (AMOS). The use of Rasch and factor analysis as complementary methods for survey development has been reported/published on previously by the first author. Results from this process were used to adjust items, scales, item groupings and parameters/factors. From this was developed 1. Stable factors: Alignment, Grading/Reporting, Actively Involving Students, and Sustaining Engagement. 2. A finalized structure for the survey with three main areas: A. Demographic B. Core I: Purposes of assessment C. Core II: values, practices and proficiencies in assessment Full deployment Once data from the full deployment was obtained, the following actions were taken: 1. Data cleaning and missing values analysis 2. EFA/CFA 3. Descriptive statistics for Cores I&II

4. Results: Inspection of exploratory results suggested that four factors were plausible. These models were tested for fit in AMOS and after trimming acceptable fit was found. Nine items across the three constructs that align to a common factor, while another 13 are in the same factor for two of the three constructs. We conclude that, while not identical, the factors are conceptually similar. Based on content review of these items, we defined them as: (1) Alignment; (2) Sustaining Engagement; (3) Involving Students, including peer- and self-assessment (PASA);

Page 57: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

57

and (4) Grading/Reporting. Following this, we computed factor scores for each factor, and examined the patterns of teachers’ mean agreement. The factors of Alignment, Sustaining Engagement, and Involving Students (PASA) are more valued by teachers than they are practiced; teachers have yet lower reported rates of self-proficiency. The one exception is Grading/Reporting, which the teachers report similar practices and values, but again relatively lower levels of proficiency. Differences between the highest and lowest means are generally large (d>.80) A clear pattern emerged from the data: To the degree that aligning assessment with curriculum, sustaining student engagement, and involving students in assessment are part of AfL, Singapore teachers in these schools already value AfL. This is an important realisation and corresponds with much research into teacher beliefs about assessment—teachers endorse assessment that helps improve teaching and student learning outcomes. However, teachers reported valuing three factors considerably more than they reported having proficiency or opportunity to carry them out. This suggests potential impedance to endorsement translating into impact. A converse pattern emerged around Grading/Reporting, with the highest mean for Practice, but lowest mean for Value. Given the public examination structure in Singapore schooling, it is unsurprising that teachers emphasised frequency of summative activity. This would seem to confirm the impetus for attempting to boost AfL. Low assigned value also corresponds to research in other national contexts suggesting that teachers tend to negatively view assessment practices that could be used to label or blame students for poor performance. This tension may create a significant challenge for policy and practice initiatives attempting to achieve a balanced assessment approach. Attempting to achieve summative/formative balance in an environment in which formal examination, grading, and reporting are maintained tends to result in a one policy being ‘hard’ (i.e., formal external accountability) and the other ‘soft’ (i.e., formative assessment for learning) (Kennedy, Chan, & Fok, 2011). This has implications well beyond Singapore, as educational systems in The United States and elsewhere attempt to achieve balance and resolve tensions in assessment (Berry, 2011). Analysis demonstrates that while factors are not perfectly identical, they are

sufficiently similar conceptually to allow common nomenclature and comparison. More importantly, some of the recovered factors mapped directly onto the intended structure. Notably , this occurred in the sustaining engagement, alignment, and involving students in assessment factors which used sets of items that had been grouped in the original design. It is only the Grading/Reporting factor that draws items from across the original factors of Doing and Accountability. Upon inspection, the connection of items is logical. Values, proficiency, and practice of assessment exist in a complex relationship that may be studied further in these data.

5. Conclusions: This paper present conclusions significant to survey use and to the topic under study: Assessment for learning. In the full paper, the following conclusions are discussed in the context of research and development agendas of interest ot a global measurement audience. Assessment for Learning Conclusions This study lends clarity to the tensions that have been identified in research into teacher perceptions of assessment; importantly, the differences in factor means provide potential directions for professional development in assessment for learning. The challenge now is to determine to what extent the lack of proficiency arises from deficient personal competencies and skills or from policy and priority conditions that are inimical to AfL. By focusing on the key element of stakeholder perceptions, we may be able to create links in AfL perceptions, policies and practices that have research, practice and development implications in Singapore and as well as any educational systems negotiating the challenges of balancing assessment priorities. Survey Use Conclusions Complex surveys can be utilized in school settings, but they require significant work and support. This work and support includes: 1. Assembling a competent team able to shepherd the process from piloting, through full deployment and into interpretation of results. 2. Negotiating with schools to allow not only access but high, valid response rates. 3. Garnering the support of high level decision-makers. Developing and utilizing this complex a survey would not have been possible without 'hard and soft' support from the Ministry of Education. This point is discussed in some detail as it especially has global impact and on objective measurement in school settings.

Page 58: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

58

Paper No KK_067

Paper Title Assesing Pedagogical Content Knowledge of the particle theory of matter and Phasa Change in Pre-service Science Teacher

Email Address [email protected]

1st Author Maryati

Affiliation Universitas Negeri Yogyakarta

Subsequent authors

Zuhdan Kun Prasetyo, ([email protected]) Universitas Negeri Yogyakarta Insih Wilujeng, ([email protected]) Universitas Negeri Yogyakarta Bambang Sumintono ([email protected]) Universiti Malaya

1. Aims/ Objectives of study:

This research aims to asses the quality of PCK in pre-service secondary science teachers in a specified topic— The particle theory of matter and phasa change using many facet rasch measurement (MFRM).

2. Sample: Sample in this research consist of 16 pre-service secondary science teachers as members of professional teacher training programe, with 32 lesson plans and instructional sessions videotaped. Those pre-services teachers assessed by three assessors (lecturers), using instrument prepared by the researchers.

3. Method: This is a quantitative research method to measure teacher’s PCK with PCK rubric that developed base on Magnuson et al.’s component model. Measuring involved multiraters and analyzed by many-facet Rasch measurement.

4. Results: Results indicate that PCK from Indonesian pre-service secondary science teachers is still low, especially on knowledge of science curricula, Knowledge of students’ understanding of science and Knowledge of instructional strategies.

5. Conclusions: The ability of science teacher’s PCK in Indonesia as a criterion of professional teachers still need to be improved and science teacher education curriculum must be reformed.

Paper No KK_069

Paper Title Modelling a Meaningful Hybrid eTraining for Diverse Learners using Rasch and SEM

Email Address [email protected]

1st Author Rosseni Din

Affiliation Universiti Kebangsaan Malaysia

Subsequent authors

1. Aims/ Objectives of study:

This study aimed at designing, developing and implementing a new hybrid meaningful e-training system, which was tested to generate a two-stage model for meaningful hybrid e-training. The early framework of the model guided development of a questionnaire to measure meaningfulness of a hybrid e-training. The questionnaire has three sections which assess (i) meaningful learning, (ii) hybrid e-training and (iii) learning style preference. Overall reliability analyses using Cronbach’s Alpha and the Rasch Model, in addition to expert reviews for the content validation of the questionnaire, suggested that the questionnaire is reliable and valid to measure a meaningful hybrid e-training program. Data collected from 213 ICT trainers were subsequently tested with confirmatory factor analysis using AMOS software to obtain three best-fit measurement models from the three latent variables. Finally, the structural equation modeling was applied to test the hypotheses.

2. Sample: 213 ICT trainers

3. Method: An iterative triangulation participatory design and validation method is used to structure the research, to show how all of the major parts of the research project - the respondents, the system, the measures - work together to try to address the central research questions. Various research paradigms were engaged which are complementary to each other due to the nature of research procedures used in educational research is multidisciplinary and multimethod. Emphasis given to a particular paradigm depends on the objective and the six phases of the research design which consist of the design, development and validation of the I-MeT system, measuring instruments and models using participative design and

Page 59: Pacific Rim Objective Measurement Symposium …proms.promsociety.org/2017/bookletv3.pdfPacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu 1 Pre-Conference

Pacific Rim Objective Measurement Symposium 2017 5 - 9 August, Kota Kinabalu

59

validation method. Data collected were analysed using Rasch and Structural Equation Modelling.

4. Results: The results showed (i) distribution of major learning style preference among respondents, (ii) evidence of a five-dimension measurement model for hybrid e-training, (iii) evidence of a five-dimension measurement model for meaningful e-training, (iv) evidence of a five-dimension measurement model for learning style preference, (v) a strong relationship between hybrid e-training and meaningful e-training, (vi) a positive relationship between learning style preference and hybrid e-training and (vii) a negative relationship between learning style preference and meaningful learning.

5. Conclusions: This section consists of three parts. The first part presents the method contribution; second part the implications for future research related to the theoretical or conceptual framework of a meaningful hybrid e-training. The third part provides several implications for the practical developments of theory, practice, and policy.

Paper No KK_070

Paper Title Exploration of the psychometric properties of Eternal Love Instrument(ELI) and validation of ELI Model: A Rasch Model Approach

Email Address

[email protected]

1st Author Akbariah Mohd Mahdzir

Subsequent authors

Norhayati Mohd Nor

1. Aims/ Objectives of study:

According to statistics provided by the Syariah Judiciary Department Malaysia (JKSM), the number of Muslim couples getting divorced rose in the past years from 20,916 in 2004 to 59,712 in 2014, 63,463 in 2015 and to 48,077 till 10th July 2016. Research to further understand this phenomenon is crucial to guide informed intervention. Hence, Eternal Love Instrument(ELI) is proposed as one of a marriage status assessment instruments designed specifically for use with married couple in marriage counselling. The aim of this study thus was to investigate the psychometric properties of the Malay version of Eternal Love Instrument(ELI) in a sample consisting of Malay married individuals (N = 500).

2. Sample: sample consisting of Malay married individuals (N = 500)

3. Method: The Rasch Model will be applied in the development process since it has been proven to be able to help in the construction of valid, reliable and unbiased items pertaining to attributes to be measured. The qualitative approached was used during the first stage since this is the most suitable approach in conceptualizing of what it meant by everlasting marriage by experts in Malaysia. During the second stage, important constructs that the researcher believes to be able to measure everlasting marriage were identified. Items were created based on the Instrument Blueprint. ELI consisted of seven major constructs. ELI was distributed and the data were analysed.

4. Results: The findings were based on these aspects: (a)Construct definition-the spread of the items ; (b)Summary of item difficulty and person ability-item-person map; ( c)Item polarity- point-measure correlation index; ((d)Fit statistics-infit and outfit; (e)Unidimensionality-RPCA; (f)Result that are consistent with the aims of measurement-Reliability and Separation; (g) Instrument usefulness- SEM as the guidance; (h)Test targeting- item and person mean within ± 2.0 SE; (i)Person fit; and (j)Usability of the measurement scale- category and step calibrations.

5. Conclusions:

Items were then improved and the test was run again. The respondents consisted of married Malay individuals in Malaysia