Download - Instrumen Penelitian, Validitas dan Reliabilitashpm.fk.ugm.ac.id/wp/wp-content/uploads/2017_Metopen_Sesi...Hindari pertanyaan tentang causality (sebab-akibat) Jika menyebutkan harus

Instrumen Penelitian,Validitas dan Reliabilitas

Trisasi Lestari - 2015

Merancang INSTRUMEN PENELITIAN

instrument construction

An instrument is amechanism for measuringphenomena, which is used

to gather and recordinformation for

assessment, decisionmaking, and ultimately

understanding.

An instrument is amechanism for measuringphenomena, which is used

to gather and recordinformation for

assessment, decisionmaking, and ultimately

understanding.

Teori

Konsep

DefinisiOperasional

InstrumenPenelitian

Komponen Instrumen

Judul

Pendahuluan: Mengapa, Bagaimana, Jenis informasi apa yang dibutuhkan, manfaat,informed consent

Petunjuk pengisianPetunjuk pengisian

Pertanyaan

Pilihan jawaban/Isian

Keterangan tambahan

Closing

Memilih instrumen

• Tujuan penelitian• Rancangan penelitian• Objek yang diteliti• Methodologi pengumpulan data• Resources/Sumber daya

Tergantung:

• Tujuan penelitian• Rancangan penelitian• Objek yang diteliti• Methodologi pengumpulan data• Resources/Sumber daya

Faktor-faktor yg dipertimbangkan:

Karakteristik populasi Literacy, physical/mental abilities, motivasi

Informasi ttg populasi yg akan diteliti No telp, alamat

Akses ke responden Lokasi, waktu, infrastructure yang ada (telephone, internet)

Tujuan survey Kompleksitas pertanyaan, sensitifitas topik,

Bentuk kuesioner yang akan diberikan Open-ended, close-ended

Perkiraan response rate

Karakteristik populasi Literacy, physical/mental abilities, motivasi

Informasi ttg populasi yg akan diteliti No telp, alamat

Akses ke responden Lokasi, waktu, infrastructure yang ada (telephone, internet)

Tujuan survey Kompleksitas pertanyaan, sensitifitas topik,

Bentuk kuesioner yang akan diberikan Open-ended, close-ended

Perkiraan response rate

Metode pengumpulan data

Self-administered Individual, Surat Group Pooling Email/internet

Observation Penilaian siswa untuk dosen Checklist

Kombinasi format dan pendekatan Perilaku + Emosi Checklist+ fill the blank+rating scales

Self-administered Individual, Surat Group Pooling Email/internet

Observation Penilaian siswa untuk dosen Checklist

Kombinasi format dan pendekatan Perilaku + Emosi Checklist+ fill the blank+rating scales

Questionnaire

A self-contained and a self-administered instrumentfor asking questions.

Lack the personal touch

Extremely efficient

Most popular

Good questionnaire ‘stands on its own’

A self-contained and a self-administered instrumentfor asking questions.

Lack the personal touch

Extremely efficient

Most popular

Good questionnaire ‘stands on its own’

Risks

Low response rates

Bias• Responden bias, half-selection

Respondent honesty• over-report good things, and under-report bad things

Wording• ‘end pregnancy’ vs ‘abortion; ‘poor’ vs ‘welfare’

Question Rules and bad examples

Clear in meaning and free of ambiguity

• “Apakah anda olahraga secara rutin?”• “Berapa nilai total kekayaan anda?”• “Apakah anda olahraga secara rutin?”• “Berapa nilai total kekayaan anda?”

Use common everyday language, avoid jargons, abbreviations, oracronyms

• MDGs, Renstra, Angka kematian,

Use neutral language, avoid emotional, leading language

• “What do you find offensive about flag burning?”• “Why do you think hitting children is wrong?”

Simple and easy• “How do you rate police response time to

emergency and non-emergency calls?”• “How many cigarettes you smokes in a year?”

• “How do you rate police response time toemergency and non-emergency calls?”

• “How many cigarettes you smokes in a year?”

Asks yourself• Does the questions answers my research question?• Is related questionnaire existed?• Do I need open-ended or close-ended questions?

Menulis pertanyaan

Full script, ditulis lengkap

Bermakna sama untuk semua responden

Respondent bisa memahami jawabannya

Disusun dengan baik

Menghindari kata-kata sulit

Menghindari kalimat negative

Menanyakan dua atau lebih pertanyaan pada saat yang sama

Full script, ditulis lengkap

Bermakna sama untuk semua responden

Respondent bisa memahami jawabannya

Disusun dengan baik

Menghindari kata-kata sulit

Menghindari kalimat negative

Menanyakan dua atau lebih pertanyaan pada saat yang sama

Menghindari kalimat panjang dan kompleks

Menghindari kalimat yang mengandung asumsi

Menghindari pertanyaan hipothetical

Hindari pertanyaan yang responden tidak tahujawabannya

Hindari pertanyaan tentang causality (sebab-akibat)

Jika menyebutkan harus jelas dan eksplisit

Jika diperlukan bisa menjelaskan istilah yang digunakan,tetapi tidak di pertanyaan

Dll (handout mp)

Menghindari kalimat panjang dan kompleks

Menghindari kalimat yang mengandung asumsi

Menghindari pertanyaan hipothetical

Hindari pertanyaan yang responden tidak tahujawabannya

Hindari pertanyaan tentang causality (sebab-akibat)

Jika menyebutkan harus jelas dan eksplisit

Jika diperlukan bisa menjelaskan istilah yang digunakan,tetapi tidak di pertanyaan

Dll (handout mp)

Contoh standar questionnaires Generic instruments

COOP/WONCA charts: measure six core aspects of functional status: physicalfitness, feelings, daily activities, social activities, change in health and overallhealth.

Sickness Impact Profile (SIP)/Functional Limitations Profile (FLP)

RAND SF 36

Duke Health Profile (DUKE)

EuroQol

MOS 20

Nottingham Health Profile

RAND General Health Perception Questionnaire (GHPQ)

Generic instruments COOP/WONCA charts: measure six core aspects of functional status: physical

fitness, feelings, daily activities, social activities, change in health and overallhealth.

Sickness Impact Profile (SIP)/Functional Limitations Profile (FLP)

RAND SF 36

Duke Health Profile (DUKE)

EuroQol

MOS 20

Nottingham Health Profile

RAND General Health Perception Questionnaire (GHPQ)

Dimension specific instruments Barthel Index Index of Independence in Activities of Daily Living Frenchay Activities Index General Health Questionnaire (GHQ) RAND Mental Health Inventory (MHI) McGill Pain Questionnaire (MPQ)

Dimension specific instruments Barthel Index Index of Independence in Activities of Daily Living Frenchay Activities Index General Health Questionnaire (GHQ) RAND Mental Health Inventory (MHI) McGill Pain Questionnaire (MPQ)

Disease/condition specific instruments State-Trait Anxiety Inventory (STAI) Center for Epidemiologic Studies Depression Scale (CES-D) Arthritis Impact Measurement Scale (AIMS) Living with Asthma (AQ) Chronic Respiratory Disease Questionnaire (CRDQ) Asthma Quality of Life Questionnaire (AQLQ) Diabetes Health Profile IDDM (DHP 1) and NIDDM (DHP2) Diabetes Quality-of-Life measure (DQOL) EORTC Quality of Life Questionnaire

Disease/condition specific instruments State-Trait Anxiety Inventory (STAI) Center for Epidemiologic Studies Depression Scale (CES-D) Arthritis Impact Measurement Scale (AIMS) Living with Asthma (AQ) Chronic Respiratory Disease Questionnaire (CRDQ) Asthma Quality of Life Questionnaire (AQLQ) Diabetes Health Profile IDDM (DHP 1) and NIDDM (DHP2) Diabetes Quality-of-Life measure (DQOL) EORTC Quality of Life Questionnaire

Membuat isi kuesioner

Melakukan literature review

Gunakan sarana/kuesioner yang sudah ada

Brainstorming Nominal Group Technique

Grup 5-6 orang Fasilitator menjelaskan ide/masalah/tujuan Setiap peserta memberikan ide tertulis dan dishare Anggota grup lainnya tidak mengkritik, tapi bisa minta klarifikasi Mengulang proses brainstorming sampai seluruh ide

terkumpulkan Setiap peserta mereview alternatif yang muncul Membuat rangking prioritas

Melakukan literature review

Gunakan sarana/kuesioner yang sudah ada

Brainstorming Nominal Group Technique

Grup 5-6 orang Fasilitator menjelaskan ide/masalah/tujuan Setiap peserta memberikan ide tertulis dan dishare Anggota grup lainnya tidak mengkritik, tapi bisa minta klarifikasi Mengulang proses brainstorming sampai seluruh ide

terkumpulkan Setiap peserta mereview alternatif yang muncul Membuat rangking prioritas


Snowballing / Pyramiding 2 2+2 4+4 dst

Delphi technique Mengumpulkan input content dan methodologi dari

expert melalui email/surat. Draft dibuat oleh peneliti dan dikirimkan kepada

ahli. Ahli memberikan komentar secara independen

Snowballing / Pyramiding 2 2+2 4+4 dst

Delphi technique Mengumpulkan input content dan methodologi dari

expert melalui email/surat. Draft dibuat oleh peneliti dan dikirimkan kepada

ahli. Ahli memberikan komentar secara independen


Questions Pool and Q-sort 60-90 pertanyaan Print pertanyaan di kartu Acak kartu Buat kriteria rangking: most definitely include this item, include this item, possibly include this item, and definitely do not include this item.

Questions Pool and Q-sort 60-90 pertanyaan Print pertanyaan di kartu Acak kartu Buat kriteria rangking: most definitely include this item, include this item, possibly include this item, and definitely do not include this item.


Concept Mapping Preparation. Generation. brainstorming, nominal group technique, to generate

statements describing activities related to the project.

Structuring. sort the statements: Q-sort or other ranking process.

Representation. create visual maps that reflect the relationship

between the sorted items.

Interpretation. Utilization.

Concept Mapping Preparation. Generation. brainstorming, nominal group technique, to generate

statements describing activities related to the project.

Structuring. sort the statements: Q-sort or other ranking process.

Representation. create visual maps that reflect the relationship

between the sorted items.

Interpretation. Utilization.

Operationalizing Constructs

Pengukuran

Pengukuran adalah suatu proses yang sistematikdan berulang untuk menghitung ataumengklasifikasikan objek atau kejadian denganmenggunakan dimensi tertentu.

Biasanya dicapai dengan penggunaan angka(numerik values)

Pengukuran adalah suatu proses yang sistematikdan berulang untuk menghitung ataumengklasifikasikan objek atau kejadian denganmenggunakan dimensi tertentu.

Biasanya dicapai dengan penggunaan angka(numerik values)

Tingkat pengukuran

Likert Scale

Rensis Likert1903 – 1981

Agreement• Sangat Setuju• Setuju• Ragu-ragu• Tidak Setuju• Sangat tidak setuju

Frequency• Sangat sering• Sering• Kadang-kadang• Jarang• Tidak pernah

Agreement• Sangat Setuju• Setuju• Ragu-ragu• Tidak Setuju• Sangat tidak setuju

Frequency• Sangat sering• Sering• Kadang-kadang• Jarang• Tidak pernah

Importance• Sangat penting• Penting• Agak penting• Tidak terlalu penting• Tidak penting

Likelihood• Hampir selalu benar• Biasanya benar• Kadang-kadang benar• Biasanya tidak benar• Hampir selalu tidak benar

Analisis Skala Likert

Likert Scale: is the sum of responses on several Likertitems

Ordinal or Interval

Deskriptif Median, Mode, Percentiles/quartiles, Display

Distribution (bar chart)

Non-parametric test Chi-squared, Mann Whitney test, Wilcoxon signed-rank

test, Kruskal-Wallis test

Modified binomial Likert Scale Chi-squared, Cochran-Q, McNemar test

Likert Scale: is the sum of responses on several Likertitems

Ordinal or Interval

Deskriptif Median, Mode, Percentiles/quartiles, Display

Distribution (bar chart)

Non-parametric test Chi-squared, Mann Whitney test, Wilcoxon signed-rank

test, Kruskal-Wallis test

Modified binomial Likert Scale Chi-squared, Cochran-Q, McNemar test

Observation Checklist

Pretesting

Initial Pretesting Individual Interviews and Focus Groups Review by Content Area Experts Continue to Obtain Feedback and Revise the Project If

Necessary

Pretesting during development Read and Reread the Items and Read the Items Aloud Review by Content Area Experts Review by Instrument Construction Experts Review by Individuals with Expertise in Writing Review by Potential Users

Initial Pretesting Individual Interviews and Focus Groups Review by Content Area Experts Continue to Obtain Feedback and Revise the Project If

Necessary

Pretesting during development Read and Reread the Items and Read the Items Aloud Review by Content Area Experts Review by Instrument Construction Experts Review by Individuals with Expertise in Writing Review by Potential Users

Pilot testing

Questions for experts Was each set of directions clear (that is, the general directions at the

beginning of the questionnaire and any subsequent directionsprovided in the body of the instrument)?

Were there any spelling or grammatical problems? Were any itemsdifficult to read due to sentence length, choice of words, or specialterminology?

How did the reviewer interpret each item? What did each questionmean to them?

Did the reviewer experience problems with the item format(s), ordoes the reviewer have suggestions for alternative formats?

Were the response alternatives appropriate to each item?

Questions for experts Was each set of directions clear (that is, the general directions at the

beginning of the questionnaire and any subsequent directionsprovided in the body of the instrument)?

Were there any spelling or grammatical problems? Were any itemsdifficult to read due to sentence length, choice of words, or specialterminology?

How did the reviewer interpret each item? What did each questionmean to them?

Did the reviewer experience problems with the item format(s), ordoes the reviewer have suggestions for alternative formats?

Were the response alternatives appropriate to each item?

Pilot testing

What problems did the reviewer encounter as a result of the organization ofthe instrument, such as how items fl owed?

On average, how long did it take to complete? What was the longest timeand what was the shortest time it took to complete the instrument?

For Web-based instruments, did the respondent encounter any problemsaccessing the instrument from a computer or navigating the instrumentonce it was accessed?

Did any of the reviewers express concern about the length of theinstrument, or did they report problems with fatigue due to the time it tookto complete?

What was the reviewer’s overall reaction to the questionnaire? Did they have any concerns about confi dentiality or how the questionnaire

would be used? Did they have any other concerns? What suggestions do they have for making the questionnaire or individual

items easier to understand and complete?

What problems did the reviewer encounter as a result of the organization ofthe instrument, such as how items fl owed?

On average, how long did it take to complete? What was the longest timeand what was the shortest time it took to complete the instrument?

For Web-based instruments, did the respondent encounter any problemsaccessing the instrument from a computer or navigating the instrumentonce it was accessed?

Did any of the reviewers express concern about the length of theinstrument, or did they report problems with fatigue due to the time it tookto complete?

What was the reviewer’s overall reaction to the questionnaire? Did they have any concerns about confi dentiality or how the questionnaire

would be used? Did they have any other concerns? What suggestions do they have for making the questionnaire or individual

items easier to understand and complete?

Pilot testing

Obtain evidence of reliability.

Establish evidence of face validity

Obtain evidence of content validity

Obtain evidence of criterion validity

Obtain evidence of construct validity

Obtain evidence of reliability.

Establish evidence of face validity

Obtain evidence of content validity

Obtain evidence of criterion validity

Obtain evidence of construct validity

Reliability

Measurement

Validity Generalisibility

Validity and reliability

Judul: mengukur kepuasan kerja

Bagaimana tingkat kepuasan kerja Anda? Scala

Faktor-faktor apa yang bisa mempengaruhi tingkatkepuasan kerja Anda? Free listing, checklist,kombinasi

Apakah gaya komunikasi pimpinan mempengaruhikepuasan kerja. Ya Tidak

Apakah besaran insentif mempengaruhi…

Judul: mengukur kepuasan kerja

Bagaimana tingkat kepuasan kerja Anda? Scala

Faktor-faktor apa yang bisa mempengaruhi tingkatkepuasan kerja Anda? Free listing, checklist,kombinasi

Apakah gaya komunikasi pimpinan mempengaruhikepuasan kerja. Ya Tidak

Apakah besaran insentif mempengaruhi…

Contoh penelitian: mengukur tinggi badanrata-rata anak SD di DIY

Yang diukur harus tinggi badan, BUKAN berat badan Valid

Alat ukurnya akan memberikan hasil yang sama meskipunseseorang diukur tinggi badannya berulang-ulang Reliable

Hasil dari pengukuran tinggi anak SD di DIY ini diharapkanbisa menggambarkan tinggi rata-rata anak SD di JawaGeneralisir

Yang diukur harus tinggi badan, BUKAN berat badan Valid

Alat ukurnya akan memberikan hasil yang sama meskipunseseorang diukur tinggi badannya berulang-ulang Reliable

Hasil dari pengukuran tinggi anak SD di DIY ini diharapkanbisa menggambarkan tinggi rata-rata anak SD di JawaGeneralisir

Validity

Apakah kita mengukur apa yang ingin kita ukur?

Konsep seringkali sulit diukur Misalnya: Konsep : Pengetahuan. Latent & Manifest Variable

Apakah kita mengukur apa yang ingin kita ukur?

Konsep seringkali sulit diukur Misalnya: Konsep : Pengetahuan. Latent & Manifest Variable

Tipe Validity

Face Validity Constructvalidity

Contentvalidity/internal

validityFace Validity Construct

validity

Contentvalidity/internal

validity

Criterionvalidity

Predictivevalidity

Multiculturalvalidity

Face Validity

Face validity is the degree to which an instrument appears tobe an appropriate measure for obtaining the desiredinformation, particularly from the perspective of a potentialrespondent.

Responden diminta untuk menilai apakah instrumenpenelitian (misal kuesioner) valid menurut mereka

Apakah responden bisa menangkap maksud pertanyaansesuai yang dimaksud peneliti Orang biasa Expert

Contoh: kuesioner tentang gaya hidup sehat, pertanyaan:seberapa sering Anda olahraga? Face validity: Valid

Face validity is the degree to which an instrument appears tobe an appropriate measure for obtaining the desiredinformation, particularly from the perspective of a potentialrespondent.

Responden diminta untuk menilai apakah instrumenpenelitian (misal kuesioner) valid menurut mereka

Apakah responden bisa menangkap maksud pertanyaansesuai yang dimaksud peneliti Orang biasa Expert

Contoh: kuesioner tentang gaya hidup sehat, pertanyaan:seberapa sering Anda olahraga? Face validity: Valid

Construct Validity Memastikan peneliti dgn responden memahami konstruk yang

sama Safety, intelligence, leadership, cleanness

Internal structure

Related to the theoretical of knowledge

Operationalization

Terdiri dari : Convergent validity : + contoh: depresi dan perasaan tidak berguna Discriminant validity: - contoh: depresi dan perasaan bahagia Harus dilaporkan keduanya

Memastikan peneliti dgn responden memahami konstruk yangsama Safety, intelligence, leadership, cleanness

Internal structure

Related to the theoretical of knowledge

Operationalization

Terdiri dari : Convergent validity : + contoh: depresi dan perasaan tidak berguna Discriminant validity: - contoh: depresi dan perasaan bahagia Harus dilaporkan keduanya

Convergent Validity

to show thatmeasures that shouldbe related are inreality related

to show thatmeasures that shouldbe related are inreality related

Discriminant Validity

to show thatmeasures thatshould not berelated are inreality not related

to show thatmeasures thatshould not berelated are inreality not related

Pengetahuan

Perilaku

Sikap

Partisipasipasien

Content/internal validity

the degree to which an instrument is representative of thetopic and process being investigated.

Misalnya: Konsep: mengukur sikap murid terhadap guru

Alat ukur sikap dg skala Likert Saya mendengarkan semua kata orang tua Guru saya selalu berusaha membantu saya Saya selalu mengucapkan salam setiap bertemu guru

Literatur review : meningkatkan kemampuan peneliti untukmencapai content validity

the degree to which an instrument is representative of thetopic and process being investigated.

Misalnya: Konsep: mengukur sikap murid terhadap guru

Alat ukur sikap dg skala Likert Saya mendengarkan semua kata orang tua Guru saya selalu berusaha membantu saya Saya selalu mengucapkan salam setiap bertemu guru

Literatur review : meningkatkan kemampuan peneliti untukmencapai content validity

Apakah konten valid atau tidak dipengaruhi oleh: pengetahuan peneliti terhadap definisi konsep, teori tentang konsep yang ada, dan bagaimana konsep itu bekerja. Sample selection bias Information bias Statistical confounding

Apakah konten valid atau tidak dipengaruhi oleh: pengetahuan peneliti terhadap definisi konsep, teori tentang konsep yang ada, dan bagaimana konsep itu bekerja. Sample selection bias Information bias Statistical confounding

Criterion Validity

making a comparison between a measure and an externalstandard.

Stroke recovery vs level of assistance required Score test individual Observasi aktifitas harian: mengikat tali sepatu, memakai baju,

menggosok gigi, merapikan tempat tidur, dll.

Harus ditunjukkan pada instrumen untuk mengukur performaatau kinerja

Dibutuhkan: Pemahaman yang baik mengenai teori konsep yang diteliti

sehingga bisa ditentukan variable-variable lain berhubungan ataudiprediksi akan berhubungan dengan faktor

making a comparison between a measure and an externalstandard.

Stroke recovery vs level of assistance required Score test individual Observasi aktifitas harian: mengikat tali sepatu, memakai baju,

menggosok gigi, merapikan tempat tidur, dll.

Harus ditunjukkan pada instrumen untuk mengukur performaatau kinerja

Dibutuhkan: Pemahaman yang baik mengenai teori konsep yang diteliti

sehingga bisa ditentukan variable-variable lain berhubungan ataudiprediksi akan berhubungan dengan faktor

Predictive validity

Apakah alat ukur yang dibuat bisa memprediksioutcomes.

Misal: apakah nilai tes TPA bisa memprediksi keberhasilan

siswa dalam mengikuti proses perkuliahan Apakah nilai TPA bisa memprediksi IPK akhir

mahasiswa Apakah tes psikologis untuk pegawai baru bisa

memprediksi seberapa loyal pegawai terhadapperusahaan

Apakah alat ukur yang dibuat bisa memprediksioutcomes.

Misal: apakah nilai tes TPA bisa memprediksi keberhasilan

siswa dalam mengikuti proses perkuliahan Apakah nilai TPA bisa memprediksi IPK akhir

mahasiswa Apakah tes psikologis untuk pegawai baru bisa

memprediksi seberapa loyal pegawai terhadapperusahaan

Multicultural validity

an instrument measures what it purports tomeasure as understood by an audience of aparticular culture

Caranya: Menggunakan bahasa yang dimengerti Memperhatikan nilai/norma/kebiasaan masyarakat

lokal

an instrument measures what it purports tomeasure as understood by an audience of aparticular culture

Caranya: Menggunakan bahasa yang dimengerti Memperhatikan nilai/norma/kebiasaan masyarakat

lokal

Mengukur validitas denganpendekatan qualitative

Evaluative

Literature review topik penelitian: memberikanbukti bahwa instrumen akan mengukur konstrukdan bukan lainnya.

Expert reviews

Table spesifikasi: identifikasi variabel topik/faktor Induktif/deduktif

Evaluative

Literature review topik penelitian: memberikanbukti bahwa instrumen akan mengukur konstrukdan bukan lainnya.

Expert reviews

Table spesifikasi: identifikasi variabel topik/faktor Induktif/deduktif

Mengukur validitas denganpendekatan quantitative

Mengukur kekuatan hubungan antara salah satupertanyaan dengan pertanyaan lain dalam konstrukyang sama

Item analysis

Factor analysis

Mengukur kekuatan hubungan antara salah satupertanyaan dengan pertanyaan lain dalam konstrukyang sama

Item analysis

Factor analysis

Pengukuran Validitas

Item analysis To demonstrate a relationship between individual

items Internal consistency reliability 1-2, 1-3, 1-4, 1-5, dst 2-3, 2-4, 2-5, 2-6, dst Dst

Further reading: The basics of item response theory(Baker, 2001)

Item analysis To demonstrate a relationship between individual

items Internal consistency reliability 1-2, 1-3, 1-4, 1-5, dst 2-3, 2-4, 2-5, 2-6, dst Dst

Further reading: The basics of item response theory(Baker, 2001)

Difficulty & Discrimination index

Tetapkan 10 subjek dg nilai terbaik dan 10 subjectdg nilai terburuk Jika subject ke-10 ada beberapa….pilih secara

random

Hitung berapa banyak subject di kelompok nilaiterbaik dan nilai terburuk yang menjawabpertanyaan 1 dg benar, pertanyaan 2 dg benar, dst

Difficulty index: (RU+RL)/20

Discrimination index: (RU-RL)/10

Tetapkan 10 subjek dg nilai terbaik dan 10 subjectdg nilai terburuk Jika subject ke-10 ada beberapa….pilih secara

random

Hitung berapa banyak subject di kelompok nilaiterbaik dan nilai terburuk yang menjawabpertanyaan 1 dg benar, pertanyaan 2 dg benar, dst

Difficulty index: (RU+RL)/20

Discrimination index: (RU-RL)/10

Name Item 1

1 1 Difficulty Index: (8+4)/20 = 0.6

2 1 Discrimination index (8-4)/10= 0.4

3 1 Compare to the maximum discriminating index

4 0 Near maximum: very discriminating

5 1 Half the maximum: moderately discriminating

6 1 A quarter the maximum: weak item

7 0 Near zero : non-discriminating

8 1 Negative: bad item

9 1

10 1 RU=8

…….…….

31 0

32 0

33 1

34 1

35 1

36 0

37 0

38 1

39 0

40 0 RL=4

Reliability

True ScoreSystematic

ErrorRandom

Error SCORE

True Score: yang ingin diukurSystematic error: kesalahan yang selalu terjadi, misal alat ukurtidak dikalibrasi, sehingga bukannya mengukur mulai dari 0 tapimulai dari 2Random error: unpredictable error yang bisa terjadi karenakebetulan atau memang benar-benar ada perubahan, misalnyamood subject saat mengikuti ujian.

Sumber random error

Subject reliability: respondent lelah, mood

Observer reliability: kemampuanobserver/interviewer, background

Situasional: kondisi saat pengukuran dilakukan(interview dilakukan dirumah dan dikantor saatsedang sibuk akan memberikan hasil yang berbeda)

Instrument: wording yang kurang baik

Data processing: salah koding, salah entry

Subject reliability: respondent lelah, mood

Observer reliability: kemampuanobserver/interviewer, background

Situasional: kondisi saat pengukuran dilakukan(interview dilakukan dirumah dan dikantor saatsedang sibuk akan memberikan hasil yang berbeda)

Instrument: wording yang kurang baik

Data processing: salah koding, salah entry

Cara pengukuran Reliability

Eyeballing : informal method, administer the instrument twice to the same group of people in a

relatively short period of time to see if their responses remainthe same

Repeated measurement1. Test-retest method When?

Carry-over effects Too early: over-reliability Too late: under-reliability

How? Mengukur seberapa kuat hubungan score yang diukur pada 2

waktu yang berbeda dengan correlation coefficient Reliable if coefficient correlation >0.7

Eyeballing : informal method, administer the instrument twice to the same group of people in a

relatively short period of time to see if their responses remainthe same

Repeated measurement1. Test-retest method When?

Carry-over effects Too early: over-reliability Too late: under-reliability

How? Mengukur seberapa kuat hubungan score yang diukur pada 2

waktu yang berbeda dengan correlation coefficient Reliable if coefficient correlation >0.7

2. Proportion agreement

Inter-rater and Intra-rater Reliability

Inter –rater: >1 rater

Intra-rater :1 rater

Calculate with Cohen’s Kappa

Inter –rater: >1 rater

Intra-rater :1 rater

Calculate with Cohen’s Kappa

k =OA - EA1 - EA

Kappa Statistic (Cohen, 1960Kappa Statistic (Cohen, 1960))

OA = A+DN

OA: Kesepakatan yang terjadiEA: Kesepakatan yg tidak disengaja

k =OA - EA1 - EA

-1 <K<1

OA = A+DN

EA =

N1 ´ N3

N+ N2 ´ N4

Néëê

ùûú

N

Observer 1Ramai Normal Total

Observer 2Ramai 140 52 192Normal 69 725 794Total 209 777 986

Kesepakatan antara observer 1 dan 2 untuk menilai apakah pasar-pasar di jogja ramai atau tidak

chance agreement between ramai-ramai=

chance agreement between normal-normal=

total expected change agreement=

Kappa=

Observed agreement=

140 + 725986

= 0.877

Test-Retest reliability

pretest the questionnaire withthe same group on two separateoccasions, expecting only minorvariations in responses.

Coefficient of variation

Mirip Eyeballing methods

pretest the questionnaire withthe same group on two separateoccasions, expecting only minorvariations in responses.

Coefficient of variation

Mirip Eyeballing methods

Internal Consistency Reliability

To compare results across and among items within a singleinstrument and to do so with only one administration.

Untuk instrumen yang punya lebih dari 1 item

Seberapa homogen item-item pertanyaan dalam 1 tes

Seberapa baik item-item pertanyaan itu mengukur satu construct

Cara menghitung: Average inter-item and average item-total correlation split half reliability coefficient alpha Kuder Richardson

To compare results across and among items within a singleinstrument and to do so with only one administration.

Untuk instrumen yang punya lebih dari 1 item

Seberapa homogen item-item pertanyaan dalam 1 tes

Seberapa baik item-item pertanyaan itu mengukur satu construct

Cara menghitung: Average inter-item and average item-total correlation split half reliability coefficient alpha Kuder Richardson

Average inter-itemand average itemtotal correlation

Internal Consistency Reliability Split-half reliability

1. pertanyaan dibagi dua secara random2. Konstruk di kedua bagian harus sama3. Hitung skor respondent untuk setiap bagian4. Hitung coefficient correlations antara skor bagian 1 dan bagian 25. Reliable jika coefficient correlation >0.8

Kuder-Richardson (KR) Membandingkan korelasi semua kemungkinan splif half Hanya cocok utk mengukur instrumen untuk satu konstruk Hanya dapat digunakan untuk instrumen yang jawabannya dikotomi,

ya-tidak, betul-salah

Split-half reliability1. pertanyaan dibagi dua secara random2. Konstruk di kedua bagian harus sama3. Hitung skor respondent untuk setiap bagian4. Hitung coefficient correlations antara skor bagian 1 dan bagian 25. Reliable jika coefficient correlation >0.8

Kuder-Richardson (KR) Membandingkan korelasi semua kemungkinan splif half Hanya cocok utk mengukur instrumen untuk satu konstruk Hanya dapat digunakan untuk instrumen yang jawabannya dikotomi,

ya-tidak, betul-salah

Coefficient alpha/ Cronbach’s alpha Seperti KR, datanya scaled/ranked

randomly split the items into two sets computethe correlation between these sets Put all theitems back randomly split them into two setsagain repeat for all possible split half correlations calculate the average of all the correlations. Internally consistent jika coefficient alpha >0.7

Coefficient alpha/ Cronbach’s alpha Seperti KR, datanya scaled/ranked

randomly split the items into two sets computethe correlation between these sets Put all theitems back randomly split them into two setsagain repeat for all possible split half correlations calculate the average of all the correlations. Internally consistent jika coefficient alpha >0.7

Cronbach’s alpha Paling sering dipakai untuk mengukur internal consistency

Diadaptasi oleh Cronbach (1951) dari Kuder&Richardson(1937)

Vtest

Vi

n

n 11

n = jumlah pertanyaanVi = variance score pada setiappertanyaanVtest = total variance dari skortotal (not %’s) on the entire test

– Large Vtest Small Ratio ΣVi/Vtest high alpha

How alpha works– Vi = pi * (1-pi)

» pi = percentage of class who answers correctly» This formula can be derived from the standard

definition of variance.

– Vi varies from 0 to 0.25

How alpha works– Vi = pi * (1-pi)

» pi = percentage of class who answers correctly» This formula can be derived from the standard

definition of variance.

– Vi varies from 0 to 0.25

pi 1-pi Vi0 1 00.25 0.75 0.18750.5 0.5 0.25

Bagaimana jika instrumen tidak reliable?

Perhatikan jika ada salah satu item instrumen yang‘salah’

Perhatikan seberapa kuat hubungan antara masing-masing item pertanyaan dengan skor

Item yang berkorelasi rendah dengan total skorakan menurunkan reliabilitas dan sebaiknyadihilangkan

Pada metode test-retest, perhatikan pertanyaanyang skor awal dan akhirnya berbeda jauh.

Perhatikan jika ada salah satu item instrumen yang‘salah’

Perhatikan seberapa kuat hubungan antara masing-masing item pertanyaan dengan skor

Item yang berkorelasi rendah dengan total skorakan menurunkan reliabilitas dan sebaiknyadihilangkan

Pada metode test-retest, perhatikan pertanyaanyang skor awal dan akhirnya berbeda jauh.

Bagaimana meningkatkanreliabilitas?

Pertanyaan tidak ambigu/jelas

Pertanyaan spesifik

Buat beberapa item pertanyaan untuk mengukursatu variable

Tetapi jangan terlalu banyak

Pertanyaan tidak ambigu/jelas

Pertanyaan spesifik

Buat beberapa item pertanyaan untuk mengukursatu variable

Tetapi jangan terlalu banyak

Generalisability

From sample to population

Sample: true exist or just a coincidence

Hypothesis Hasil penelitian Kenyataan dipopulasi

Interpretasi

Null hypothesis(H0):Tidak adahubungan antaraperilaku hidupbersih sehatdengan kegiatanUKS

Ada hubunganantara perilaku

hidup bersih sehatdengan kegiatan

UKS

Ada hubunganantara perilakuhidup bersih sehatdengan kegiatanUKS

Null hypothesisditolak


Ada hubunganantara perilaku

hidup bersih sehatdengan kegiatan

UKSAlternativehypothesis (H1)Ada hubunganantara perilakuhidup bersih sehatdengan kegiatanUKS

Tidak adahubungan antaraperilaku hidupbersih sehatdengan kegiatanUKS

Type 1 error

Implikasi:Kegiatan UKSdiperbanyak

Hypothesis Hasil penelitian Kenyataan dipopulasi

Interpretasi


Tidak adahubungan antara

perilaku hidupbersih sehat

dengan kegiatanUKS


Null hypothesisditerima


Tidak adahubungan antara

perilaku hidupbersih sehat

dengan kegiatanUKS


Alternativehypothesis (H1)Ada hubunganantara perilakuhidup bersih sehatdengan kegiatanUKS

Ada hubunganantara perilakuhidup bersih sehatdengan kegiatanUKS

Type 2 error

Implikasi:menghapuskankegiatan uks

Berapa besar kemungkinan type 1 error?

Diukur dengan level of significance / p-values/coefficient alpha

Semakin kecil coefficient alpha, semakin kecilkemungkinannya terjadi type 1 error

Cut-off point yg sering dipakai p<0.05 significant

Dipengaruhi oleh: sample size Besarnya perbedaan dalam sample

Interpretasi Bagaimana jika p=0.052 atau p=0.049?

Diukur dengan level of significance / p-values/coefficient alpha

Semakin kecil coefficient alpha, semakin kecilkemungkinannya terjadi type 1 error

Cut-off point yg sering dipakai p<0.05 significant

Dipengaruhi oleh: sample size Besarnya perbedaan dalam sample

Interpretasi Bagaimana jika p=0.052 atau p=0.049?

Pertanyaan

Kalau satu hubungan antar variable menunjukkanp<0.05 apakah berarti hasil itu penting?

Jika effect size hubungan antar variable besar,apakah berarti hubungan itu penting?

Apakah internal consistency reliability dan constructvalidity itu hal yang sama?

Jika pengukuran statistics menunjukkan hasil yangsignifikan apakah itu berarti fenomenanya bisaditemukan di populasi umum?

Kalau satu hubungan antar variable menunjukkanp<0.05 apakah berarti hasil itu penting?

Jika effect size hubungan antar variable besar,apakah berarti hubungan itu penting?

Apakah internal consistency reliability dan constructvalidity itu hal yang sama?

Jika pengukuran statistics menunjukkan hasil yangsignifikan apakah itu berarti fenomenanya bisaditemukan di populasi umum?

Instrument Qualitative: Interview Guide

Instruksi Beginning : information, informed consent Concluding

Questions Open-ended Key themes Factual questions before opinion questions Use probes or request to elaborate

Instruksi Beginning : information, informed consent Concluding

Questions Open-ended Key themes Factual questions before opinion questions Use probes or request to elaborate

Validitas dan Reliabilitas dalampenelitian kualitatif

Trustworthiness

Meningkatkan trustworthiness

Thick description mengumpulkan data secara rinci dan komprehensif,

yang menggambarkan secara keseluruhan apa yangsedang terjadi

Negative/defiant case analysis

Triangulation (data, subject,methods)

Member checking

Thick description mengumpulkan data secara rinci dan komprehensif,

yang menggambarkan secara keseluruhan apa yangsedang terjadi

Negative/defiant case analysis

Triangulation (data, subject,methods)

Member checking

[email protected]

Download - Instrumen Penelitian, Validitas dan Reliabilitashpm.fk.ugm.ac.id/wp/wp-content/uploads/2017_Metopen_Sesi...Hindari pertanyaan tentang causality (sebab-akibat) Jika menyebutkan harus

Top Related