beyond instrumentation

29
Beyond instrumentation: redesigning measures and methods for evaluating the graduate college experience Patricia L. Hardré & Shannon Hackett Received: 19 December 2013 / Accepted: 21 September 2014 / Published online: 5 October 2014 # Springer Science+Business Media New York 2014 Abstract This manuscript chronicles the process and products of a redesign for evaluation of the graduate college experience (GCE) which was initiated by a univer- sity graduate college, based on its observed need to reconsider and update its measures and methods for assessing graduate studentsexperiences. We examined the existing instrumentation and procedures; met with and interviewed staff and stakeholders regarding individual and organizational needs; collected systematic questionnaire data on stakeholder perceptions; and then redesigned, developed, and tested new evaluation instruments, systems, and procedures. The previously paper-based, one-time global exit questionnaire was redesigned into a digitally administered, multi-event assessment series, with content relevant to studentsincremental academic progress. Previously discrete items were expanded into psychometrically coherent variable scales in parallel forms to assess change over time (entry, mid-point, exit, post-graduation). They were also strategically designed as stable and independent enough so administrators could vary the timing and sequence of administration to fit their ongoing needs The team conducted two testing cycles, gathering pertinent information on the redesigned as- sessment and procedures (N =2,835). The final redesigned evaluation serves as an exemplar of evaluation that enhances assessment quality including psychometric prop- erties and multiple stakeholder validation, more effectively addresses the organizations incremental evaluation needs, increases timeliness of data collection, improves reach to and participation of distributed students, and enables longitudinal data collection to provide ongoing trajectory-of-change evaluation and a research data stream. Product and process analysis informs strategies for more effectively and dynamically assessing graduate education. Keywords Graduate experience . Assessment design, development, and testing . Program evaluation . Higher education . Graduate education Educ Asse Eval Acc (2015) 27:223251 DOI 10.1007/s11092-014-9201-6 P. L. Hardré (*) : S. Hackett Department of Educational Psychology, Jeannine Rainbolt College of Education, University of Oklahoma, 820 Van Vleet Oval, ECH 331, Norman, OK 73019-2041, USA e-mail: [email protected]

Upload: magdy-mahdy

Post on 18-Feb-2017

255 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Beyond instrumentation

Beyond instrumentation: redesigning measuresand methods for evaluating the graduate collegeexperience

Patricia L. Hardré & Shannon Hackett

Received: 19 December 2013 /Accepted: 21 September 2014 /Published online: 5 October 2014# Springer Science+Business Media New York 2014

Abstract This manuscript chronicles the process and products of a redesign forevaluation of the graduate college experience (GCE) which was initiated by a univer-sity graduate college, based on its observed need to reconsider and update its measuresand methods for assessing graduate students’ experiences. We examined the existinginstrumentation and procedures; met with and interviewed staff and stakeholdersregarding individual and organizational needs; collected systematic questionnaire dataon stakeholder perceptions; and then redesigned, developed, and tested new evaluationinstruments, systems, and procedures. The previously paper-based, one-time global exitquestionnaire was redesigned into a digitally administered, multi-event assessmentseries, with content relevant to students’ incremental academic progress. Previouslydiscrete items were expanded into psychometrically coherent variable scales in parallelforms to assess change over time (entry, mid-point, exit, post-graduation). They werealso strategically designed as stable and independent enough so administrators couldvary the timing and sequence of administration to fit their ongoing needs The teamconducted two testing cycles, gathering pertinent information on the redesigned as-sessment and procedures (N=2,835). The final redesigned evaluation serves as anexemplar of evaluation that enhances assessment quality including psychometric prop-erties and multiple stakeholder validation, more effectively addresses the organization’sincremental evaluation needs, increases timeliness of data collection, improves reach toand participation of distributed students, and enables longitudinal data collection toprovide ongoing trajectory-of-change evaluation and a research data stream. Productand process analysis informs strategies for more effectively and dynamically assessinggraduate education.

Keywords Graduate experience . Assessment design, development, and testing .

Program evaluation . Higher education . Graduate education

Educ Asse Eval Acc (2015) 27:223–251DOI 10.1007/s11092-014-9201-6

P. L. Hardré (*) : S. HackettDepartment of Educational Psychology, Jeannine Rainbolt College of Education, University ofOklahoma, 820 Van Vleet Oval, ECH 331, Norman, OK 73019-2041, USAe-mail: [email protected]

Page 2: Beyond instrumentation

This project involved entirely reconceptualizing, extending, and expanding a researchuniversity graduate college’s program evaluation measures and methods. The redesignprocess elicited original need assessment and ongoing feedback from students andfaculty across the university’s graduate programs. The team tested and refined instru-mentation and system designs iteratively, based on stakeholder perspectives. The newinstruments and system were designed to provide direct information needed by theGraduate College and also to provide data-driven feedback to graduate departments andprograms to support their continual program improvement. This 2-year-long systematicdesign and development process replaced a one-page, qualitative exit questionnairewith a multi-event, systematic design; digital, online administration; and psychometri-cally sound instruments, aligned with current organizational goals and reporting needs.We went beyond instrumentation, to include redesign of the administration media,timing, and other systemic features.

This manuscript will first present a review of the relevant foundational and currentresearch and evaluation literature. Then, it will present an overview of the projectmethods, over phase I (needs analysis, instrument and systems redesign, and alpha testing)and phase II (iterative revision and beta testing). Following the overview, each phase ofthe process and instrumentation will be broken down into sequential, detailed proceduresand specifications, with results of each analysis and implications leading to the next phaseor to final recommendations as appropriate. It will conclude with both evaluation lessonslearned and principles supported and the important contributions of this work to academicprogram assessment and more general evaluation research and practice.

1 Literature review

More than 1.5 million people are enrolled in graduate programs in the USA each year(Gardner and Barnes 2007; Allum et al. 2012) and many times that number worldwide(Council of Graduate Schools 2012). Contrary to popular belief, many major researchuniversities enroll more graduate students than undergraduates (US Department ofEducation 2005). Yet, relatively little systematic research is conducted that informsmore than a very small subset of those who teach, manage, and make policy to supportgraduate students (Nesheim et al. 2006).

2 Studies of the graduate experience

A number of studies have been done focused on various elements of the graduatecollege experience (GCE). Some of these studies have been localized, focused on asingle discipline or program (e.g., Benishek and Chessler 2005; Coulter et al. 2004;Gardner and Barnes 2007; Hegarty 2011; Schram and Allendoerfer 2012). Otherstudies have focused on very specific groups, such as alumni, dropout, or non-attendees, and only addressed a few key variables such as why they chose to leave ornot attend (e.g., Belcher 1996; Delaney 2004; Lipschultz and Hilt 1999). Some studiesconducted internationally have combined disciplinary and institutional factors withbroader cultural factors, generating deeply contextualized data to inform local needs(e.g., Kanan and Baker 2006).

224 Educ Asse Eval Acc (2015) 27:223–251

Page 3: Beyond instrumentation

Others have attempted to reach more broadly but faced low return on the populationsampled, raising questions about their representativeness (e.g., Davidson-Shivers et al.2004; Farley et al. 2011). In each of these cases, different methods and instrumentshave been used and different construct and characteristics studied, making it difficult tocompare findings. The generally discrete nature of the samples has made it difficulteven to synthesize the findings in ways that inform graduate education. In manyuniversities, each college or department devises its own measures, making comparisonseven within the institution problematic. The body of research on the GCE could bemore effective and productive across universities, if there was accessible, consistent,and comparable instrumentation to measure some common characteristics and goals ofgraduate programs and institutions.

In spite of the lack of comparability across these studies, a few principles are clear,both from the collection of findings and from the more global literature on thepsychologies of adult education and human experience. Major changes of contextand experience, such as going to graduate school, cause people to go through identitytransitions and experience dramatic change in their self-perceptions and how theyunderstand themselves and others (Austin et al. 2009; Chism et al. 2010; HephnerLaBanc 2010), often including very strong self-doubt and anxiety (Gansemer-Topfet al. 2006; Brinkman and Hartsell-Gundy 2012). Graduate education involvesredirecting cognitive attention and emotional energy in ways that can impact keyrelationships and cause family and emotional crisis (Baker and Lattuca 2010).Success in graduate school depends on interpersonal and social relationships, as wellas on intellectual mastery (Cicognani et al. 2011). Being back in acadème after yearsaway can be a tremendous adjustment, which is amplified when the return is to adifferent discipline, culture, and context, requiring substantial reacculturation andsocialization (Fu 2012; Hardré et al. 2010b).

3 Need for graduate-level information and feedback

Various sources cite attrition from graduate programs as high as 50 % or more (Lovitts2001; Offstein et al. 2004). Given the life changes attributable to returning to graduateeducation, it is easy to understand that many students might not make those shifts easilywithout substantial support. Graduate education is a huge investment of time, funding,and expertise, by faculty, departments, and institutions (Stone et al. 2012; Smallwood2004). Institutions, research units and policy-making bodies need clear, useful infor-mation about graduate education (Gansemer-Topf et al. 2006).

Much research and scholarly attention on the graduate experience has been focusedon academic abilities and aptitudes (Golde 2000), and success has been largelyattributed to academic preparation (Fu 2012). Popular measures of these characteristicsinclude (1) standardized tests (such as the graduate record examination (GRE), requiredby most graduate programs nationally) and (2) grade point averages (GPAs) fromprevious and current coursework. These measures are easy because they are simple,quantified, and standardized, and thus comparable and generalizable.

However, academics are only part of the story that explains graduate students’academic success. Interacting with them are numerous other elements of graduate life,such as scholarly and professional development, personal satisfaction, identity, stress

Educ Asse Eval Acc (2015) 27:223–251 225

Page 4: Beyond instrumentation

and anxiety, social support, peer relationships and community, and overall well-being(Gansemer-Topf et al. 2006; Offstein et al. 2004). Some studies have addressedsocialization into graduate school and into the scholarly culture and values of students’disciplines and professions, generating sets of factors that influence these processes(e.g., Gardner and Barnes 2007; Weidman et al. 2001). However, it is unclear how thecharacteristics and circumstances of an increasingly diverse and ever-changing profileof the graduate student interacts with both institutional constants and discipline-basedcultural nuances to support their learning and professional development (see alsoHardré 2012a, b; Hardré and Chen 2005, 2006).

This information needs to include insight into the current and authentic nature of thegraduate college experience, its impacts on students, other impacts on students’ successwithin it, and students’ perceptions of their journeys. Perceptions are important in anynovel experience and particularly in transitions, as the nature and impacts of transitiondepend less on the actual, measurable events than on the participants’ individual andcollective perceptions of those events (Hardré and Burris 2011; Schlossberg et al. 1995;Bloom et al. 2007). Stress is a core component of the graduate experience, and peoplehandle stressful circumstances very differently (Offstein et al. 2004; Williams-Tolliver2010). Goals and goal attainment have tremendous impact on how people work and learn(Kenner and Weinerman 2011). We have seen goals and expectations studied amonghigher education faculty, showing significant effects (e.g., Hardré et al. 2011), yet littlesystematic research has included the goals and expectations that graduate students bringinto their educational experience and the reasons why they make choices along the way.Some theorists and practitioners have called for more concerted institutional efforts atunderstanding and supporting graduate students’ experiences and success, similar tothose traditionally focused on undergraduates (e.g., Gansemer-Topf et al. 2006;Hyun et al. 2006).

4 Need for instrument and system design and fit

Various efforts have been made to produce standardized measures and create nationaldatabases of information on graduate students. More than a decade ago, the NationalDoctoral Program Questionnaire, funded by the Alfred P. Sloan Foundation, washeralded as a grassroots attempt to use data about the graduate experience to improvegraduate education nationally (Fagen and Suedkamp Wells 2004). The HigherEducation Research Institute (HERI) project and the American Educational ResearchAssociation (AERA) graduate questionnaire project strove to generate data via ques-tionnaire instruments for comparing student experiences and faculty perceptions oftheir work climates (HERI 2012). However, in centralized systems such as these,neither the measurement parameters (instruments, participants, sampling, timing) northe resulting raw data sets are directly accessible to, or controlled by, potential users(researchers or institutions), which severely limits their utility.

Researchers and administrators in graduate education need instruments that gener-alize and transfer across institutions and contexts (Hyun et al. 2006). Having adaptive,useful, and efficient tools to investigate the graduate experience in higher educationcould help address the need for more scholarly research in this critical area for highereducation (Gardner and Barnes 2007). Having the right tools and information could

226 Educ Asse Eval Acc (2015) 27:223–251

Page 5: Beyond instrumentation

help administrators assess and address issues with attention to specialized local needs(Nesheim et al. 2006). It is clear that a need exists for systematically designed and well-validated tools for assessing a range of dimensions of the graduate experience, toaddress issues relevant to graduate program development and improvement, as seenthrough graduate students’ perspectives. Beyond instrumentation, graduate institutionsneed insight into administrative systems, timing, and related strategies to supportoptimal assessment.

5 Method

5.1 Context and reflexivity

This project occurred in a public research university in the Southwestern USA. TheGraduate College is more than 100 years old and enrolls over 4,000 advanced degreestudents annually. It confers doctoral and masters degrees in hundreds of academicmajors, both traditional programs and continuing education degree programs andcertificates. Some programs are very structured with students in cohorts, while othersare more fluid and adaptive, allowing students to cover curricula on their own pace andschedule, supported by their academic advisors. The institutional culture gives auton-omy to colleges and departments to determine graduate academic program require-ments, and the graduate college oversees curriculum revisions, monitors progress, andmaintains accountability. The graduate student body is 70 % US domestic and 30 %international from 42 countries; ages range from 21 to 90, and it is about evenly dividedby gender. Full-time students make up 60 % of the graduate populations, and theremaining 40 % attend part-time; many graduate students also work outside of schooland have families.

The evaluator and assessment designer was a senior graduate faculty member in theuniversity, with specialized training and expertise in this area, who also did externalevaluation design and consulting professionally. The Graduate College Dean invitedthe faculty member to take on the evaluation redesign project, based on the advice ofthe university Provost. The evaluator worked on this project without personal financialcompensation, but with the understanding that she could use the data gathered forresearch presentation and publication. The Graduate College did provide one graduateassistantship (0.5 fte) to assist with the primary project tasks. The evaluator also utilizeda team of other graduate assistants on particular components of the project.

5.2 Process and procedures overview

5.2.1 Phase I: needs analysis, redesign, and alpha testing

Invited by the Graduate College Dean to redesign its assessment of the graduateexperience, the team reviewed the relevant literature to gain a general scope of coverageand variables of interest. Consistent with evaluation standards, we also involved otherswith interest in the outcomes (faculty and administrative stakeholders) to define theevaluation (Yarbrough et al. 2011). We conducted focus groups and interviews andadministered generative, paper-based instruments with students, faculty, and

Educ Asse Eval Acc (2015) 27:223–251 227

Page 6: Beyond instrumentation

administrators. The goal at this early stage was to determine the most appropriatevariables and indicators and to include nuanced information for client and programneeds.

Based on this information, the team designed and developed the first (alpha) versionof the GCE assessment instrument. Given the need to reach a distributed group oftechnology-active participants with multiple tools, it was decided to use online admin-istration and the first (alpha) instruments were developed with the SurveyMonkey®administrative software. At this stage, three initial versions of the instruments weredeveloped. Over 500 students completed the alpha test instruments, producing dataadequate to demonstrate generally good psychometric characteristics and also deter-mine refinements necessary to improve on the GCE assessments.

5.2.2 Phase II: revision and beta testing

Following the analysis of the development and alpha test data, the evaluation teamgenerated a revised version of the GCE instrument. During the alpha testing, the teamrecognized relevant limitations in the original (SurveyMonkey®) digital administrationsystem. In consultationwith theGraduate College administration, it was decided to developthe beta instrument with the more adaptive Qualtrics® digital administration system.

In its beta versions, the GCE evaluation contained refined scales and items. It wasalso extended to include forms for five participant groups, the original three (entrance,mid-point, and exit) plus two additional (non-attendees and alumni). These additionalforms extended the range of information the evaluation package provided to the GCclient. Over 2,000 student participants completed the beta instrument. In addition to thestudent respondents, the evaluation team sent the beta instrument to faculty whoinstruct and mentor graduate students across all academic colleges for feedback on itsfit and relevance to their program evaluation and improvement needs. This strategy wasbased on a general interest in faculty perceptions (as key stakeholders), plus theGraduate College’s organizational goal of producing data useful to graduate programs.

The beta data yielded additional information for further refining all five forms of theinstruments and baseline findings for the GC clients. These data were analyzed in twowaysfor two types of outcomes: instrument performance and participant response patterns.

6 Phase I: needs analysis, redesign, and alpha testing

6.1 Needs analysis

The purpose of the needs assessment and analysis was to determine how students,faculty, staff, and administrators defined the nature, parameters, and goals of thegraduate experience. The results of this process provided information to guide thescope, definitions, and instrument development, as well as the testing plan.

6.1.1 Participant stakeholder groups

Four stakeholder groups were identified to provide input for the redesign and datatesting: 13 graduate students, 23 faculty, 10 staff, and 5 administrators of the Graduate

228 Educ Asse Eval Acc (2015) 27:223–251

Page 7: Beyond instrumentation

College. A convenience sample was drawn from a list of individuals generatedby the Graduate College and evaluation team. All of the identified members ofthe stakeholder groups participated in focus groups and some in additionalfollow-up interviews to inform needs and determine the scope and content of theGCE instruments.

Graduate students and graduate college assistants The sample group to determine thedefinition of the graduate experience was derived from the pool of graduate students atthe university. This sample included graduate assistants working in the graduatecollege, and members of an interdisciplinary group of graduate student programrepresentatives, along with students they recruited. Diverse graduate students partici-pated at this stage in the process to help frame instrument language appropriate acrossall groups.

Faculty, staff, and administrators Faculty, staff, and administrators at the univer-sity have unique perspective on the role of the Graduate College and conceptsof the graduate experience. To better understand these issues, the evaluatorssolicited feedback from graduate program professors and administrators fromvarious colleges.

6.1.2 Procedure

To define and clearly identify components of the graduate experience, theevaluation team used focus groups, interviews, and open-ended questionnaireinstruments. Due to their explanatory nature and the designers’ developmentalinterest in dialogue with stakeholders, these first questionnaires were paper-based. Responses were transcribed and coded in analysis. Participants wererecruited through targeted e-mails and mailings using contact lists of currentgraduate students, faculty, staff, and administrators provided by the GraduateCollege.

Focus groups Focus groups (of six to ten participants) discussed issues relatedto the graduate experience (time ≈60 min). The format was semi-structured withsome direct questions available to guide the meeting and address relevant goals.Sample question was “What events and activities are part of the graduate student’sexperience at [univ]?”

Interviews Each individual interview was conducted in a semi-structured format(time ≈60 min). Each interview concerned either feedback on instrument developmentor more detailed understanding of issues raised in a previous focus group. Twenty-twoquestions were created as options to ask the interviewee concerning the graduateexperience. Sample question was “Please define for you what constitutes the graduateexperience at [univ].”

Open-ended questionnaires Participants completed a 12-question (≈30-min) question-naire. Sample question was “What is your perception of the Graduate College?”

Educ Asse Eval Acc (2015) 27:223–251 229

Page 8: Beyond instrumentation

6.1.3 Results of needs analysis

Data from focus groups, interviews, and open-ended questionnaires provided thedefinition of the scope and terms to develop the GCE alpha questionnaire assessmentinstruments. The information from these participants generated the scales and languageused for the first formal questionnaire items.

From the needs analysis and developmental data, the following points were clear:

& All of the stakeholder groups agreed that a new and more comprehensive assess-ment of the GCE was needed.

& There were differences among groups as to what should be included and whatshould be emphasized in the new instrument.

& There were, however, enough points of convergence and even consensus to draftand test an instrument that reflected both the client’s interests and the breadth ofother stakeholders’ needs. The single-event, end-of-degree administration via paperquestionnaire needed to be redesigned and replaced with methods more attentive tocurrent activities, goals, and needs.

Based on these results, the evaluators proceeded with designing and testing a newassessment instrument and system.

7 Redesign of administration timing, media, and methods

Parameters of this redesign needed to be accessible and salient for students. To meetthis need, the redesign included various administrations spread over students’ graduateexperience (which lasted from 2 to 10 years). A challenge of timing (given thevariability in duration across degrees and among full-time and part-time stu-dents) was identifying the key points in progress at which students wouldreceive each instrument. Program faculty and department administrators needprompt, timely feedback to support program improvement. This could beachieved in part by the multi-event, incremental assessment design, and furtherenhanced by creating parallel forms of instruments that offered data on developmentalchange over time. Based on client and stakeholder input, the following potentialimprovements were indicated.

& Appropriateness of item content for participant users could be improved bythe incremental administration redesign so students received questions attimes more proximate to their actual experiences. Utility and primary inves-tigator potential for administrative users (the Graduate College, academicprograms) could be improved by the incremental administration redesign, sothey received data before students graduated, making responsive improve-ments more timely and meaningful.

& Administration efficiency and utility for the client could be improved by digitaladministration that eliminated manual data entry. Administration rescheduling andtimeliness of access for users could be improved by online digital administrationthat they could access from remote and distributed sites.

230 Educ Asse Eval Acc (2015) 27:223–251

Page 9: Beyond instrumentation

& Administration potential to share data with academic programs (a stated goal) andability to track change over time would both be vastly improved by the redesignusing both incremental administration and digital instrumentation.

7.1 Procedure

The evaluation team developed item sets to address relevant constructs and outcomesindicated by the developmental data. Multiple team members independently examinedthe transcripts, discussed and collaborated to develop the overall instrument scope andcontent (topical scales and items). Then, the team organized the content for initialadministration to students, as appropriate to their degree (masters/doctoral) andprogress-toward-degree of study (entrance/mid-Point/exit). All administration occurredin an asynchronous online questionnaire administration system, with all participantidentification separated from item responses. Testing participants were recruited via e-mail invitation, using lists of eligible students provided by the Graduate College. Allstudy activities were consistent with human subjects’ requirements and approved by theinstitutional IRB. De-identified responses were analyzed and stored according to IRBstandards for data security and confidentiality.

7.2 Participants

Participants were 504 graduate students invited to take the form of the questionnaireappropriate to their point-in-program whether they were at the beginning (130), middle(118), or end (256). Detailed participant demographics are shown in Table 1. Studentswere demographically representative of the larger graduate student population oncampus, with similar distributions of genders, ethnicities, colleges, and degree types(within ±6.1 %).

The eventual intent of the instrument design was to assess developmental trajectoriesof experiences in the same graduate students over time (a within-subjects sample).However, in order to collect sample data efficiently, in a single year, we used differentgraduate students as proxy groups for progress-in-program (a between-subjectssample).

7.3 Instruments

A total of 149 items were developed for the first round (alpha) instruments: 21demographic items (selection and fill-in), 97 Likert-type items, 19 dichotomous (yes/no) items, and 12 open-ended items. For the Likert-type items, after consultation anddiscussion with the client regarding the tradeoffs in various scale lengths and config-urations, an eight-point scale (1=strongly disagree, 8=strongly agree) without a neutralmid-point was used. In addition to the formal quantitative items, open-response fieldswere provided and participants encouraged to “explain any responses” or “provide anyadditional information.”

The items were organized into theoretical and topical clusters and subscales. Thesections were (1) Why Graduate School?, (2) Admissions Process, (3) Decision toAttend, (4) Financial Aid, (5) The Graduate Experience, (6) Graduate College Advising

Educ Asse Eval Acc (2015) 27:223–251 231

Page 10: Beyond instrumentation

and Staff, (7) Graduate College Events, (8) Graduate College Media and Materials, (9)Program of Study Satisfaction, (10) Social Interaction, and (11) University Resourcesand Services. Table 2 shows a summary of the scope and organization of the instru-mentation, as well as reliabilities and factor structures.

Based on the initial redesign feedback, three web-based forms of the questionnaireinstruments were created. Each was designed to measure satisfaction with the graduateexperience at specific incremental points in students’ progress-toward-degree: entry,mid-program, and exit. Participants were recruited via e-mail and provided with active,generic hyperlinks to the questionnaires, which they could access from any location.

Timing for the assessments was at three key time points in their specific programs: atentrance (their first semester), mid-point (first semester of second year for masters; first

Table 1 Alpha participant demographic characteristics

Frequency Percentage

All Masters PhD Institution Sample

Degree type

Masters 375 – – 72.5 75.0

Doctoral 125 – – 27.5 25.0

Gender

Male 237 164 70 51.7 47.6

Female 261 209 51 48.3 52.4

Ethnicity

African American/black 31 23 8 5.0 5.7

Asian American/Asian 44 29 15 5.1 8.1

Pacific Islander/native Hawaiian 2 2 – 0.2 0.4

Hispanic/Latino 25 23 2 5.2 4.6

Native American/American Indian 29 24 5 4.9 5.4

White/Caucasian 397 295 98 72.7 73.3

Other 14 9 5 6.9 3.6

Colleges

Architecture 6 6 – 2.2 1.2

Arts and Sciences 217 148 69 37.0 43.0

Atmospheric and Geographic Sciences 6 5 1 3.5 1.2

Business 37 30 6 8.3 7.4

Earth and Energy 11 9 2 5.2 2.2

Education 60 42 18 18.0 11.9

Engineering 43 30 13 14.1 8.6

Fine Arts 25 17 7 5.6 5.0

Journalism and Mass Communication 10 6 4 1.8 2.0

International Studies 11 11 – 0.4 2.2

Liberal Studies 46 44 2 2.9 9.2

Dual Degree/Interdisciplinary 31 26 3 0.8 6.2

232 Educ Asse Eval Acc (2015) 27:223–251

Page 11: Beyond instrumentation

semester of third year for doctoral students), and exit (graduating semester). At thisstage of development, all students completed the same questionnaire sections, with twoexceptions: Admissions Process (entry students only) and Career Preparation(mid-point and exit only).

8 Analysis

Once questionnaires were completed, data were exported to SPSS® for statistical analysis.Means and standard deviations were computed for each Likert-type question. Additionalsubgroup mean comparison statistics were computed for significant differences (bydegree type—masters and doctoral, and progress-toward-degree groups). Exploratoryfactor analyses (EFAs) were conducted on theoretical and topical sections with more thanfive items, to examine structural nuances and help determine the appropriateness of itemswithin sections. Reliabilities for the theoretically coherent subscales were computed usingCronbach’s alpha (target α≥0.80). Additional generative commentary and questionsprovided qualitative information to utilize in evaluation and revision.

8.1 Alpha measurement testing results

The alpha testing focused on measurement performance, with the system test implicit atall stages, from development through response patterns. The assessment of validity atthis stage was a preliminary look at the appropriateness of instrument scope, item

Table 2 Section overview (alpha version)

Type of Scale No. of items Alpha No. of factors

Why graduate school Item cluster 8 – –

Admissions process Subscale 6 0.864 1

Decision to attend Item cluster 8 – 3

Financial aid Item cluster 8 – –

The graduate experience

Graduate experience satisfaction Subscale 13 0.928 2

To me, the graduate experience includes… Item cluster 12 – 3

Graduate college advising and staff Subscale 4 0.813 1

Graduate college events Item cluster 10 – –

Graduate college media and materials Item cluster 5 – 1

Program of study satisfaction

Program of study Subscale 9 0.806 2

Academic advisor Subscale 7 0.975 1

Academic program faculty Subscale 12 0.950 1

Career preparation Subscale 6 0.841 1

Social interaction Subscale 9 0.830 2

University resources and services Subscale 19 0.881 4

Negatively worded items were reverse-coded both for the reliability and factor analyses

Educ Asse Eval Acc (2015) 27:223–251 233

Page 12: Beyond instrumentation

content, and overall fit. The assessment of reliability at this stage was to assess subscaleand section range and coherence, along with the nature and roles of item contributions.Data on section-level relationships and item contributions would support refinement forboth content appropriateness and instrument efficiency, without reducing instrumentcoherence or sacrificing measurement scope. Qualitative responses and commentaryprovided information on how particular participants and subgroups processed andinterpreted the instrument content, which further informed revision and refinement.

8.2 Validity

The first goal of the alpha testing analysis (focused on validity) was to assess theappropriateness, scope, and fit of the instrumentation for addressing the target variablesand indicators (Cook and Beckman 2006; Messick 1995), overall and for each sub-group at point-in-program. This included not only the item and section content but alsothe instructions and administration system. Validity information was contained in thedevelopmental data (from the “Needs analysis” section), based on both expert-clientand user-stakeholder perspectives on what should be included. From the alpha testing,analysis of authentic user responses added authenticity to illuminate the hypothetical.The EFAs were conducted on all sections (criteria of loading at 0.80 with cross-loadings not exceeding 0.30). This analysis would confirm that the language used inthe items (taken from the generative contributions of various stakeholders) werecommunicating what they were intended to, and relating appropriately with one anotheras interpreted by end-users. Additionally, the open-ended fields inviting additionalexplanation and commentary were analyzed for contributions to the scope, content,and appropriateness of the instrument and sections, as well as for system issues. Mostof the scales showed adequate and consistent loadings, and those falling short of targetcriteria provided information needed to refine them. The sample was inadequate todemonstrate discriminatory capacity for all subgroups of interest, but its differentialperformance in some global groups (such as between masters and doctoral students)showed promise. Overall, the synthesis of validity data showed both current strengthand future promise in the redesigned GCE assessment.

8.3 Reliability

The second goal (focused on reliability) was to conduct a preliminary assessment ofsubscales’ internal coherence and item-level contributions, along with their discrimi-natory capacity. As evidence of internal reliability, all of the theoretically coherentsections (scales) were assessed for internal coherence using Cronbach’s alpha (criterionof 0.80). Some met the test immediately, and others varied based on nuances inparticipant responses. These data analyses demonstrated how those scales and sectionsfalling short of the standard could be refined to meet it and thus improve the measure.All instruments demonstrated high stability over multiple administrations.

8.4 Divergence of “should be” versus “is”

Notably more comments were received on the item set defining the Nature of TheGraduate Experience. That section’s item stem was phrased as: “For me, the graduate

234 Educ Asse Eval Acc (2015) 27:223–251

Page 13: Beyond instrumentation

experience includes….” followed by the list of descriptors supplied by students, faculty,and staff during the needs analysis process. Comments on this section converged to thequestion of whether that section’s instructions were intended to address what thestudent’s actual experience did include or an ideal perception of what the graduateexperience should include. The original item had been written to address the former, thestudent’s actual experience, but the frequency of these comments illuminated a patternof fairly widespread perceptions that there was a difference between the two. That is,they suggested a need to inquire as to how graduate students’ actual experiencesdiffered from their expectations of what they should be. In addition, factor structuringshowed a divergence of content focus between perceptions that clustered as preferencesand perceptions that clustered as quality indicators as a proxy for satisfaction, indicatinga need to further restructure this section.

8.5 Perceived length

A common global comment received was that the whole instrument was very long. Werecognized that containing just over 100 substantive items, it was longer than moststudents commonly completed (particularly in the current climate of short, quick digitalquestionnaires). However, the internal systems data also confirmed that average time-on-task for users who completed all items was only about 30 min. This was within thetask time required that we had predicted (below the maximum time range in ourparticipant consent document). It was also within the time frame considered reasonablefor an online administration, with the caveat that some users may perceive it to be muchlonger.

8.6 Online administration

The cumulative data on system redesign (both method and tooling) indicated that thedigital, online administration was more effective and appropriate for reaching moregraduate students, including distance learners and part-time students, than the previous(paper-based) method. The specific tool chosen (SurveyMonkey®) had presentedchallenges in development, requiring a good deal of specialized back-end programmingfor configuring it to deliver the instrument as designed. In particular, differential optionsthat required skip logic and similar special presentations were tedious to develop. Inaddition, some critical issues that arose in compatibility with user-end systems requiredintervention. For a new evaluation package used over time and across platforms, wedecided to seek a new administration tool that would add ease for both developers andend-users.

9 Conclusions and measurement revisions

The evidence and information provided by the full range of data produced in the firstround of instrument testing demonstrated that the GCE redesign was largely successfulto date. This reasonable sample yielded strong evidence for both the validity andreliability of the instruments at this stage. It also provided a good deal of bothpsychometric data and direct user feedback on how they could be further improved

Educ Asse Eval Acc (2015) 27:223–251 235

Page 14: Beyond instrumentation

for the beta testing. Based on all of the information accrued, the evaluators made thefollowing revisions for the next round of testing:

Given the users’ qualitative feedback on the “Nature of the GraduateExperience” we adopted the dual range of their responses, one general andideal, the other personal and perceptual. In the beta, this item cluster was presentedtwo parallel clusters: the graduate experience should include and my graduate experi-ence does include.

& By the client’s request, two more participant group versions were added: alumniand non-attendees. The first increased the scope and range of assessment ofprogram effects beyond students’ perceived preparation for careers, to include theirexperiential perceptions after graduation and entry into their professions. It consti-tuted a fourth sequential assessment for students who attended this institution. Thesecond addressed the client’s interest in what caused candidates accepted intograduate programs not to enter them, to support recruitment and retention efforts.It constituted an entirely different instrument for a new participant group.

& Based on multiple item-level and scale-level analyses, we determined that approx-imately 17 items could be removed to shorten the assessment without reducingsubscale reliabilities. However, we retained those items in the beta versions, inorder to test those conclusions with a second round of testing and a larger, morediverse sample.

& We acknowledged that our revision decisions included significantly increasing thelength of the instrumentation and that the users already perceived it to be long.However, we wanted to gain evidence for the full range of possible redesigndecisions from the retest data in the beta cycle. We determined that with theindependent sample test-retest data, we would be better equipped with ampleevidence to make those revision decisions for the final client handoff.

& Based on the weaknesses found in the initial tool, we selected a different develop-ment and administration system for the beta testing, with more sophisticateddevelopment functionality and the added benefit of institutional licensureaccessibility.

10 Phase II: redesign and beta testing—student questionnaires

10.1 Procedure

All administration occurred in an asynchronous online questionnaire administrationsystem, with all participant identification separated from item responses. A new(between-subjects) group of testing participants was recruited via e-mail invitation,using lists of eligible students provided by the Graduate College. Participants wereoffered small individual incentives (tee-shirts for the first 100 completing the instru-ments) and all participants entered into a drawing for a larger incentive (adigital device). All study activities were consistent with human subject requirementsand approved by the institutional IRB. All participant data was de-identified and keptconfidential.

236 Educ Asse Eval Acc (2015) 27:223–251

Page 15: Beyond instrumentation

10.2 Participants

The 2,081 current or potential student participants were invited to take the form of thequestionnaire appropriate to their identity and point-in-program whether they wereindividuals admitted, but who chose not to attend (22); students at the beginning (661),middle (481), or end of their program (672); or alumni (245). Detailed participantdemographics are shown in Table 3. Participants were demographically representativeof the larger graduate student population on campus, with similar distributions ofgenders, ethnicities, and colleges (within ±6.6 %). Two colleges were overrepresented:Liberal Studies (+9.5 %) and Dual Degree/Interdisciplinary (+16.0 %). Masters stu-dents were also overrepresented (+13.6 %). Response rate from e-mail lists was 72.6 %(2,081 out of 2,865).

Table 3 Beta participant demographic characteristics

Frequency Percentage

All Masters PhD Institution Sample

Degree type

Masters 1431 – – 72.5 86.1

Doctoral 230 – – 27.5 13.9

Gender

Male 863 716 146 51.7 46.9

Female 1019 904 114 48.3 54.1

Ethnicity

African American/black 166 151 15 5.0 8.8

Asian American/Asian 146 108 38 5.1 7.8

Pacific Islander/native Hawaiian 5 5 – 0.2 0.3

Hispanic/Latino 110 96 12 5.2 5.8

Native American/American Indian 85 77 8 4.9 4.5

White/Caucasian 1,297 1,118 179 72.7 68.9

Other 74 66 8 6.9 3.9

Colleges

Architecture 30 30 – 2.2 1.5

Arts and Sciences 652 523 111 37.0 33.3

Atmospheric and Geographic Sciences 36 26 8 3.5 1.8

Business 113 102 9 8.3 5.8

Earth and Energy 52 49 3 5.2 2.7

Education 225 172 44 18.0 11.5

Engineering 146 109 32 14.1 7.5

Fine Arts 47 34 13 5.6 2.4

Journalism and Mass Communication 31 24 7 1.8 1.6

International Studies 53 51 – 0.4 2.7

Liberal Studies 243 225 6 2.9 12.4

Dual Degree/Interdisciplinary 328 277 29 0.8 16.8

Educ Asse Eval Acc (2015) 27:223–251 237

Page 16: Beyond instrumentation

10.3 Instruments

A total of 268 items were administered for the second round (beta) questionnaires: 17demographic items (selection and fill-in), 237 Likert-type items, 9 dichotomous (yes/no) items, and 5 open-ended items. Similar to the alpha questionnaires, for theoreticallycontinuous items, an eight-point Likert scale (1=strongly disagree, 8=strongly agree)was used. Open-response fields enabled participants to “explain any responses” or“provide any additional information.”

The 11 sections for the alpha questionnaires largely remained, with some newsections added and refined, based specifically on the data and feedback from the alphatesting. After the revisions, a total of three sections were added to create a betterunderstanding of the Graduate College experience.

Five forms of the questionnaire instruments were created: non-attend, entrance, mid-point, exit, and alumni. The expanded design was for graduate students to be assessedat four time points in their programs: at entrance (their first semester), at mid-point(first semester of their second year for masters students, or first semester oftheir third year for doctoral students), at exit (their graduating semester), and at2 years post-graduation. The fifth version would be completed only by studentswho were accepted but chose not to attend, to help the Graduate College gaininformation about why. All student forms were parallel except for the Admissions(entry only) and Career Preparation (mid-point, exit, and alumni). Further, someitems within sections were unique to alumni, relevant to post-graduation expe-riences. The non-attend version of the questionnaire was much shorter anddifferent in context than the other instruments as appropriate to its purposeand target group. The various sections and subscales are described below, with theresults of their reliability and factor analyses as appropriate. The summary of statisticalresults is also shown in Table 4.

10.4 Subscales and item clusters

The 14 sections were divided into subscales and/or item clusters as follows:

Why graduate school? This section was designed to determine the reasons thatstudents attend graduate school. It presented the item stem “I am pursuing agraduate degree,” and then listed 17 different reasons, each with a Likert-typescale. Sample item is “I am pursuing a graduate degree…to gain a competitiveadvantage in my field.” The EFA showed four factors.Admission process. This section presented items about the individual’s admissionexperience, process, and satisfaction. First, a single item addressed whether or notstudents used the (then still optional) online system (dichotomous). Second, asubscale addressed participants’ satisfaction with their admissions process (fouritems; Likert-type; α=0.866). Sample item was “The instructions for completingthe application were adequate and easy to understand.” The EFA confirmed asingle factor.Decision to attend. This section assessed reasons why students chose tocome to this university. It first asked if this was the student’s first choiceschool (dichotomous). Then, a summary Likert-type item to endorse was

238 Educ Asse Eval Acc (2015) 27:223–251

Page 17: Beyond instrumentation

“I am happy with my decision to attend [univ]1.” The third component was an itemcluster (14 items; Likert-type scale). Item stem was “My decision to attend [univ]was influenced by…” followed by 16 different responses to endorse (e.g., “havingsimilar research interests as professors in the department”).Financial aid. This section asked students to identify the sources and types of theirsupport for attending and engaging in graduate studies (e.g., graduate assistant-ships, tuition waivers).Graduate experience. This section consisted of three parts: satisfaction, what itshould be, what it is (all on Likert-type scales). Satisfaction with the graduateexperience (12 items; α=0.901). Sample item was “I would recommend [univ] toprospective graduate students.” The EFA confirmed one factor. Students’

1 These items presented the university’s acronym, replaced here with the generic “[univ]”.

Table 4 Summary of instrument structure and statistical performance (beta version)

Type of scale No. of Items Alpha No. of factors

Why graduate school Item cluster 18 – 4

Admissions process Subscale 4 0.866 1

Decision to attend Item cluster 17 – 4

Financial aid Item cluster 7 – –

The graduate experience

Graduate experience satisfaction Subscale 12 0.901 1

To me, the graduate experience should include… Item cluster 34 – 4

To me, the graduate experience did include… Item cluster 34 – 3

Graduate college advising and staff Subscale 6 0.879 1

Graduate college events Item cluster 2 – 1

Graduate college media and materials Subscale 8 0.924 1

Graduate program self-efficacy

Success in graduate program Subscale 6 0.808 1

Success in chosen profession Subscale 7 0.873 1

Program of study satisfaction

Program of study Subscale 12 0.865 2

Academic advisor Subscale 9 0.987 1

Academic program faculty Subscale 12 0.971 1

Career preparation satisfaction

Career preparation Subscale 10 0.973 1

Utility/value of degree Subscale 5 0.938 1

Professional competence Subscale 10 0.957 1

Social interaction Subscale 21 0.855 3

University resources and services Subscale 19 0.929 3

Final thoughts Qualitative 2 – –

Negatively worded items were reverse-coded both for the reliability and factor analyses

Educ Asse Eval Acc (2015) 27:223–251 239

Page 18: Beyond instrumentation

perceptions of what the graduate experience “should include” and theparallel of what it “does include” for that student (34 items each) bothpresented item stems followed by lists of parallel characteristics, each forthe student to endorse. Sample item was “To me, the graduate experienceshould include…developing close connections with faculty.” The EFA showed thatthe “should include” scale loaded on four factors, while the “does include” loadedon three.Graduate college advising and staff. This section first asked students whether theyhad experienced direct contact with the GC staff, for advising or other assistance,then presented items assessing their understanding of its role and services five5items; Likert-type; α=0.879). Sample item was “I understand the role of theGraduate College.” The EFA confirmed a single factor.Graduate College events. This section assessed students’ participation in variousGC-sponsored activities, to support ongoing program planning. Sample items were“I attended activities during [event]” (dichotomous), and “I often attend GraduateCollege sponsored events” (Likert-type).Graduate College media and materials. This section assessed students’ satisfactionwith, and perceived benefit from, the GC website and other informational materials(eight items; Likert-type; α=0.924). Sample item was “Viewing information onthe Graduate College’s website benefits me.”Graduate program and career self-efficacy. This section (two subscales) assessedstudents’ perceptions of self-efficacy (positioning for success) in their graduateprograms and professions. Program self-efficacy consisted of six items (Likert-type; α=0.808) and professional self-efficacy of seven items (Likert-type;α=0.873). Sample items were “I am certain that I will do well in this graduateprogram.” and “I am just not sure if I will do well in this field.” EFA confirmed onefactor for each subscale.Program of study satisfaction and career preparation. This section (four subscales)assessed students’ satisfaction with various components of their graduate pro-grams: program (focus on content and curriculum) (12 items; α=0.848; 2 factors),program faculty (focus on teaching and advising) (20 items; α=0.966; 2 factors),career preparation (9 items; α=0.973; 1 factor), and career utility and value ofdegree (5 items; α=0.938; 1 factor) (all Likert-type items). Sample items wereprogram (“I believe that the level of difficulty in my coursework is appropriate”),faculty (“The faculty in my program are fair and unbiased in their treatment ofstudents.”), career preparation (“My program area course content is preparing meto practice effectively in the field.”), and career utility and value of degree(“My graduate degree will open up current and future employment opportunities.”).Professional competence and identity development. This subscale assessed stu-dents’ perceptions of becoming competent professionals (ten items; Likert-type;α=0.957). Sample item was “More and more, I am becoming a scholar in myfield.” EFA confirmed a single factor.Social interaction. This subscale assessed participants’ social interaction andengagement in the graduate community (21 items; Likert-type; α=0.855). Someitems differed for alumni, as appropriate. Sample items were current students(“I have many friends in this university”) and alumni (“I am still in contactwith friends from my graduate program”).

240 Educ Asse Eval Acc (2015) 27:223–251

Page 19: Beyond instrumentation

University resources and services. This section assessed participants’ satisfactionwith university campus resources and services (19 items; Likert-type; α=0.929).Sample item was “I am happy with the condition of the building(s) containing myclassrooms.”Final thoughts. They were also asked to answer two qualitative questions describ-ing notable positive and challenging experiences in graduate school. Items were“Please describe one of your most meaningful and important graduate experiencesat this university to date. Give as much detail as possible. Include the reasons whyit was so meaningful and important for you,” and “Please describe one of yourmost challenging graduate experiences at this university to date. Give as muchdetail as possible. Include the reasons why it was so challenging for you.”

11 Analysis

The same instrument performance analyses were conducted for the beta test data as forthe alpha test, utilizing SPSS® (see Table 4). In addition, the larger beta sample sizemade it possible to perform more fine-grained subgroup mean comparison statisticswith greater statistical power, to confirm that the instruments maintained reliabilitywithin subgroups, and determine if they also demonstrated some discriminatorypower for within-group differences (see Table 5). To assess their discriminatorypotential, we used two key subgroups, by-degree (masters and doctoral) andprogress-toward-degree (entry, mid-point, exit). Student subgroup data demonstratedgood consistency of performance across subgroups, with some discrimination of meandifferences.

12 Phase II: redesign and beta testing—faculty questionnaires

12.1 Procedure

Faculty members were also asked to give feedback regarding the various forms andsubscales on the student questionnaires. Five forms of web-based questionnaire instru-ments were created to parallel the five versions of the student beta questionnaires,presenting faculty members with screenshots of the student instruments and uniqueresponse items for faculty. Participants were recruited via e-mail and provided withactive, generic hyperlinks to the questionnaires. They responded regarding the valueand fit of that information for their program development and improvement.

12.2 Participants

Faculty participants were invited from a list of faculty who teach and advise graduatestudents. The list was randomly divided into five groups, and each received one of thefive forms of the student questionnaires (all sections). Faculty responses (N=199) weredivided as follows: 43 non-Attend, 33 entrance, 42 mid-point, 44 exit, and 37 alumni.Detailed participant demographics are shown in Table 6.

Educ Asse Eval Acc (2015) 27:223–251 241

Page 20: Beyond instrumentation

Tab

le5

Summaryof

subgroup

means

bydegree

type

andprogress-tow

ard-degree

All

(N=1,663)

Masters

(N=1,431)

PhD

(N=230)

Non-attend

(N=14)

Entrance

(N=598)

Mid-point

(N=410)

Exit

(N=593)

Alumni

(N=205)

Adm

issionsprocess

6.54

6.60

6.12

6.46

6.54

––

Decisionto

attend

7.12

7.15

6.90

5.09

7.22

7.01

7.10

7.20

The

graduateexperience

6.51

6.54

6.37

–6.63

6.34

6.49

6.58

GraduateCollege

advising

andstaff

6.21

6.20

6.29

6.07

5.79

5.64

5.76

GraduateCollege

mediaandmaterials

5.52

5.51

5.60

–5.60

5.33

5.58

Successin

graduateprogram

6.91

6.92

6.85

–6.88

6.83

6.98

Successin

chosen

profession

6.36

6.37

6.31

–6.39

6.33

6.37

6.28

Program

ofstudy

6.07

6.02

6.37

–6.21

5.91

6.04

Academicadvisor

6.16

6.08

6.73

–6.29

6.03

6.13

Academicprogram

faculty

6.83

6.83

6.82

–6.92

6.84

6.83

6.58

Careerpreparation

6.74

6.72

6.88

–6.86

6.63

6.80

6.49

Utility/value

ofdegree

6.82

6.80

6.95

–6.97

6.76

6.88

6.38

Professionalcompetence

6.56

6.52

6.84

–6.65

6.54

6.58

6.33

Socialinteraction

4.95

4.94

5.02

–4.99

4.84

4.97

5.28

University

resourcesandservices

6.27

6.28

6.26

–6.27

6.16

6.34

Allsubscalesaremeasuredon

aneight-pointLikertscale(1=strongly

disagree,8

=strongly

agree)

242 Educ Asse Eval Acc (2015) 27:223–251

Page 21: Beyond instrumentation

12.2.1 Instruments

Faculty members reviewed screen captures of each section of the student questionnairesand responded to six items (three Likert and three open-response).

12.2.2 Perceived appropriateness

This section assessed how appropriate (applicable, coherent, and useful) the facultymembers found the student assessment sections (three items; Likert-type; α=0.80).Items were “The items in this section are applicable to our graduate department/program;”“The items in this section are cohesive, providing perspective related to the section topic;”and “The results from this section will be useful to know about our graduate students.”

Table 6 Frequency of faculty par-ticipant demographic characteristics

All

Gender

Male 115

Female 51

Other gendered 1

Ethnicity

African American/black 3

Asian American/Asian 5

Pacific Islander/native Hawaiian –

Hispanic/Latino 4

Native American/American Indian 2

White/Caucasian 144

Other 6

Colleges

Architecture 9

Arts and Sciences 109

Atmospheric and Geographic Sciences 8

Business 10

Earth and Energy 6

Education 10

Engineering 19

Fine Arts 5

Journalism and Mass Communication 7

International Studies –

Liberal Studies 1

Dual Degree/Interdisciplinary –

Professorial rank

Assistant professor 31

Associate professor 58

Full professor 80

Other 3

Educ Asse Eval Acc (2015) 27:223–251 243

Page 22: Beyond instrumentation

12.2.3 Open-response items

Three additional generative items invited original faculty input: (1) “Are there anyadditional items that you believe need to be added to this section? If so, please identifywhich items those are, and why they are needed here;” (2) “Are there any items herethat you believe should be removed from this section? If so, please identify which itemsthose are, and why they should be removed;” and (3) “Other comments.”

13 Analysis

Analyses were conducted utilizing SPSS®. Reliabilities for the fit scale werecomputed as Cronbach’s alpha (target α≥0.80). De-identified questionnaire re-sponses were analyzed and stored according to IRB standards for data security andconfidentiality.

On the three quantitative fit items, faculty members reported finding the informationapplicable, cohesive, and useful for their programs (M=6.34, SD=1.42). Overallappropriateness of each questionnaire was as follows: non-attend (M=6.17, SD=1.41), entrance (M=6.15, SD=1.71), mid-point (M=6.21, SD=1.41), exit (M=6.79,SD=1.15), and alumni (M=6.38, SD=1.42). Tables 7 and 8 show the subscale itemmeans and standard deviations of the faculty feedback.

14 Overall measurement performance results

These data together constitute an independent samples’ test-retest of the GCE instru-ment and system redesign. The beta testing cycle was a confirmatory retest, along withsome extension and refinement, of the alpha testing. Its analysis functioned on the samegoals, assessing the validity, reliability and fit of the new GCE assessment, throughboth direct administration and stakeholder feedback.

Understanding the item-level contributions, particularly across two testings withindependent, authentic user samples, supported final instrument refinement for align-ment and efficiency. We had retained longer versions of the evaluation instrumentsknowing that the second testing would confirm or disconfirm which items could beremoved to retain optimal evaluative effectiveness and efficiency. In addition, the tworounds of testing (alpha and beta) provided independent confirmation of the psycho-metric properties of these measures. Results from the beta instrument testing weresimilar to those from the alpha cycle.

The first goal of the beta testing analysis was to assess the appropriateness, scope,and fit of the refined instrument content in addressing the target variables and indica-tors, overall and for key student subgroups (by degree type and point-in-program). Thescales and sections performed with a high degree of consistency across the wholegroup, while also demonstrating the capacity to discriminate between groups both bydegree type (masters/doctoral) and by progress-in-program (entry, mid-point, exit).The scales and sections once again loaded consistently, demonstrating goodtest-retest stability as evidence of reliable performance and validity in assessing thetarget constructs.

244 Educ Asse Eval Acc (2015) 27:223–251

Page 23: Beyond instrumentation

The second goal was to conduct a confirmatory assessment of subscale reliability,subscale and section range and coherence, and item contributions. Consistent with theirperformance in the previous cycle, nearly all subscales met the target criteria in bothinternal consistency and factor loadings. Those that demonstrated less coherence(generally the newly added and reorganized sections) demonstrated statistically howthey could be refined to meet the criteria. Across the two testing cycles, the scales andsections also demonstrated a high level of test-retest stability and external consistency.

In addition to their performance with students, the instruments received favorableperceptions of fit from faculty members across colleges and disciplines. Few additionsand deletions were recommended, and those suggested were specific to particular fieldsrather than generally appropriate to the broader graduate faculty. Overall, the revised(beta version) GCE instrument demonstrated excellent measurement performance.

Table 7 Faculty feedback on scalefit reliabilities, means, and standarddeviations

All subscales are measured on aneight-point Likert scale (1=strongly disagree, 8=stronglyagree). Also, “Success in gradu-ate program” and “Success inchosen profession” were mea-sured as one section; “Academicadvisor” and “Academic programfaculty” were measured as onesection; and “Career preparation”and “Utility/value of degree”were measured as one section

Fit

Alpha Mean SD

Demographics 0.905 6.09 1.65

Why graduate school 0.958 6.32 1.79

Admissions process 0.960 6.55 1.67

Decision to attend 0.942 6.75 1.44

Financial aid 0.912 6.73 1.41

The graduate experience

Graduate experience satisfaction 0.956 6.81 1.23

To me, the graduate experience shouldinclude…

0.951 6.47 1.50

To me, the graduate experience doesinclude…

0.956 6.84 1.27

Graduate college advising and staff 0.923 6.55 1.52

Graduate college events 0.966 5.70 1.90

Graduate college media and materials 0.928 6.57 1.19

Graduate program self-efficacy

Success in graduate program 0.968 6.41 1.59

Success in chosen profession 0.968 6.41 1.59

Program of study satisfaction

Program of study 0.969 6.80 1.29

Academic advisor 0.918 7.06 1.25

Academic program faculty 0.918 7.06 1.25

Career preparation satisfaction

Career preparation 0.967 6.72 1.36

Utility/value of degree 0.967 6.72 1.36

Professional competence .958 6.50 1.38

Social interaction 0.958 6.49 1.38

University resources and services 0.948 6.47 1.45

Final thoughts 0.989 6.64 1.56

Educ Asse Eval Acc (2015) 27:223–251 245

Page 24: Beyond instrumentation

The new administration system (Qualtrics®) required something of a learning curvein development but paid off with a high degree of clarity and usability for bothdevelopers and end-users. A few user comments included confusion regarding theinterface, but those were easily addressed. As to time-on-task required to complete thebeta version, participants took only a few minutes more than the alpha version (37 minon average). One in-system revision indicated prior to implementation was to simplifythe programming logic, as the originally complex skip-logic appeared to confound useof the progress bar.

15 Data-driven findings demonstrating evaluation enhancement

While the research-based findings are the topic of separate manuscripts, it is importanthere to underscore those that constitute evidence of the value-added of this particular

Table 8 Faculty section means

Applicability(N=141)

Cohesiveness(N=137)

Usefulness(N=137)

Demographics 6.34 5.99 6.09

Why graduate school 6.70 6.62 6.32

Admissions process 6.37 6.33 6.27

Decision to attend 6.79 6.73 6.72

Financial aid 6.70 6.87 6.64

The graduate experience

Graduate experience satisfaction 6.94 6.75 6.77

To me, the graduate experience should include… 6.51 6.48 6.44

To me, the graduate experience does include… 6.89 6.85 6.79

Graduate college advising and staff 6.58 6.67 6.48

Graduate college events 5.73 5.91 5.60

Graduate college media and materials 6.62 6.67 6.40

Graduate program self-efficacy

Success in graduate program 6.49 6.38 6.34

Success in chosen profession 6.49 6.38 6.34

Program of study satisfaction

Program of study 6.89 6.66 6.82

Academic advisor 7.19 6.96 7.09

Academic program faculty 7.19 6.96 7.09

Career preparation satisfaction

Career preparation 6.82 6.64 6.72

Utility/value of degree 6.82 6.64 6.72

Professional competence 6.56 6.50 6.47

Social interaction 6.44 6.57 6.41

University resources and services 6.51 6.54 6.37

Final thoughts 6.65 6.61 6.66

246 Educ Asse Eval Acc (2015) 27:223–251

Page 25: Beyond instrumentation

redesign strategy. One powerful product of this project was the instruments themselves,developed from the direct input of faculty, staff, administrators, and students, thentested and refined through authentic use. In addition, among data-driven findings arepotentially important patterns that are illuminated by specific elements of the instru-ment and system design. For example, the subgroup differences by degree type andpoint in progress-toward-degree had not been demonstrated in the previous publishedliterature nor had they every been analyzed or compared in this Graduate College’sevaluation process, because the previous design did not allow for this type of compar-ison. Also important was the general pattern of the mean score drop at mid-point acrossmultiple perceptions, as this trajectory of perceptions had been demonstrated in veryfocused and small-scale studies, but not in a diverse interdisciplinary group of graduatestudents across an entire university population. This was, again, because the publishedstudies did not present design of instrumentation and implementation on this scope.Similarly, the development of parallel scales, such as the two forms of the GraduateExperience scale (“should” and “is”) and the two self-efficacy scales (program andcareer), support direct comparison of differential perceptions in these potentiallyimportant nuanced constructs. In the test samples, there were some striking differencesin these perceptions. In addition, the redesign to include both graduate college andprogram-level outcomes, explicitly endorsed by graduate faculty, supported the grad-uate college giving back de-identified data to departments and programs. The redesignincluded moving to online administration, which resulted in dramatically improvedparticipation rates in the graduate college evaluation overall, and the dual-phase testingprocess included testing two different development and administration systems andidentifying weaknesses in one before full implementation. These results underscore thevalue and importance of the redesign to include the range and types of perceptualinstruments, the development and delivery system, and the multi-point (trajectory) ofadministration.

16 Limitations

A limitation of this developmental design and analysis is implicit in the availablesample. It was (1) volunteer (required by IRB) rather than comprehensive and (2) fromindependent samples (resulting in between-subjects rather than within-subjects analy-sis). These sampling constraints introduce variability beyond that for which the instru-ments were designed. However, following implementation, the authentic, within-subjects sample will be accessed over the next 5 years. An additional limitation is thesample from a single institution and future goals include a multi-institutional test.

17 Conclusions

Based on the instrument and system performances, the evaluators recommendedtransfer to the client for full implementation, with a list of items the client could chooseto delete without reducing the overall quality of the measure. It was important tounderscore that assessment efficiency is not the only criterion for item selection orinclusion. Efficiency must be balanced with effectiveness as operationalized by scope

Educ Asse Eval Acc (2015) 27:223–251 247

Page 26: Beyond instrumentation

and range of each scale or section. The evaluators proposed length reduction using thecriteria of maximum efficiency without reducing scale reliabilities (below 0.80) orunduly constraining the scope of assessment to exclude a critical subgroup of studentsor disciplines typically represented in a research university. Based on these criteria, amaximum of 55 items could be removed. After discussion with the client, only 19 itemswere removed for initial implementation, to maintain a robust instrument with thegreatest range for possible nuanced differences among colleges and disciplines.

These redesigned program evaluation methods and measures offer substantivebenefits consistent with the Graduate College’s expressed goals and emergent needs.Product and process outcomes include the following:

1. Updated, reasoned, multi-event administrative process, attentive to organizationand programs across the university

2. Psychometrically sound instrumentation that produces objectively verifiable, de-fensible results

3. Excellent validity evidence on internal context, scope, structure and substance ofthe instrumentation, with perceived fit and value-added perceptions of facultyacross disciplines

4. Excellent reliability evidence including internal coherence as well as external andfactor structures, test-retest with students; and consistency across subgroups

5. Self-contained, stand-alone variable subscales and item clusters that enable admin-istrators to utilize part or all sections of the instrument as needed

6. Updated administrative system and media to reach a larger group including off-siteand distributed graduate students

The team also emphasized that in addition to administering the complete instrumentat once, each subscale and section is designed as a potential stand-alone section. It isfeasible for an institution or unit to remove some sections that address issues of lessimmediate priority or administer sections at different times with this design. If the userintended to compare responses from the various sections, administering them at thesame time would control for some order and administration effects. Future directionsfor this project include the extension via longitudinal testing with the dependent samplefor which it was originally designed. That data may also provide additional confirma-tory insight on performance of the shorter (revised) version supported by these data.

18 Discussion

The redesign of assessments goes beyond instrumentation. Rethinking assessments ismuch more than generating a new set of items, or even user instructions. Effectiveredesign requires re-examining the full range of features, contexts, and conditions,including timing, technology, tools, reframing of longitudinal instrumentation, and soon, to produce a whole-system redesign. Many institutional assessments are moving todigital administration systems, a shift that is more than simple digitization, involvingtranslation, as well as transfer (Bandilla et al. 2003; Hardré et al. 2010a).Administrators need to consider design features (Vincente and Reis 2010) as well assystem and context elements that may influence user behaviors and consequent data

248 Educ Asse Eval Acc (2015) 27:223–251

Page 27: Beyond instrumentation

outcomes (Hardré et al. 2012). Tools and systems need to be tested in authentic wayswith real user participants (Patton 2012), so test data not only reflects accurate product,but also illuminates issues of process that may need to be adjusted for final implemen-tation. This systematic and systemic approach to assessment design, development, andtesting provides the rigor needed to demonstrate accurate assessment and validate datameaningfulness and use.

References

Allum, J. R., Bell, N. E., & Sowell, R. S. (2012). Graduate enrollment and degrees: 2001 to 2011.Washington: Council of Graduate Schools.

Austin, J., Cameron, T., Glass, M., Kosko, K., Marsh, F., Abdelmagid, R., & Burge, P. (2009). First semesterexperiences of professionals transitioning to full-time doctoral study. College Student Affairs Journal,27(2), 194–214.

Baker, V. L., & Lattuca, L. R. (2010). Developmental networks and learning: toward an interdisciplinaryperspective on identity development during doctoral study. Studies in Higher Education, 35(7), 807–827.

Bandilla, W., Bosnjak, M., & Altdorfer, P. (2003). Survey administration effects? A comparison of web-basedand traditional written self-administered surveys using the ISSP environment model. Social ScienceComputer Review, 21, 235–243.

Belcher, M. J. (1996). A survey of current & potential graduate students. Research report 96–04. Boise: BoiseState University.

Benishek, L. A., & Chessler, M. (2005). Facilitating the identity development of counseling graduate studentsas researchers. Journal of Humanistic Counseling Education and Development, 44(1), 16–31.

Bloom, J. L., Cuevas, A. E. P., Evans, C. V., & Hall, J. W. (2007). Graduate students’ perceptions ofoutstanding graduate advisor characteristics. NACADA Journal, 27(2), 28–35.

Brinkman, S. N., & Hartsell-Gundy, A. A. (2012). Building trust to relieve graduate student research anxiety.Public Services Quarterly, 8(1), 26–39.

Chism, M., Thomas, E. L., Knight, D., Miller, J., Cordell, S., Smith, L., & Richardson, D. (2010). Study ofgraduate student perceptions at the University of West Alabama. Alabama Counseling AssociationJournal, 36(1), 49–55.

Cicognani, E., Menezes, I., & Nata, G. (2011). University students’ sense of belonging to the home town: therole of residential mobility. Social Indicators Research, 104(1), 33–45.

Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometricinstruments: theory and application. The American Journal of Medicine, 119, 116.e7–166.e16.

Coulter, F. W., Goin, R. P., & Gerard, J. M. (2004). Assessing graduate students’ needs: the role of graduatestudent organizations. Educational Research Quarterly, 28(1), 15–26.

Council of Graduate Schools. (2012). Findings from the 2012 CGS international graduate admissions survey.Phase III: final offers of admission and enrollment. Washington: Council of Graduate Schools.

Davidson-Shivers, G., Inpornjivit, K., & Sellers, K. (2004). Using alumni and student databases for evaluationand planning. College Student Journal, 38(4), 510–520.

Delaney, A. M. (2004). Ideas to enhance higher education’s impact on graduates’ lives: alumni recommen-dations. Tertiary Education and Management, 10(2), 89–105.

Fagen, A. P., & Suedkamp Wells, K. M. (2004). The 2000 national doctoral program survey: an online studyof students’ voices. In D. H. Wulff, A. E. Austin, & Associates (Eds.), Paths to the professoriate:strategies for enriching the preparation of future faculty (pp. 74–91). San Francisco: Jossey-Bass.

Farley, K., McKee, M., & Brooks, M. (2011). The effects of student involvement on graduate studentsatisfaction: a pilot study. Alabama Counseling Association Journal, 37(1), 33–38.

Fu, Y. (2012). The effectiveness of traditional admissions criteria in predicting college and graduate successfor American and international students. Doctoral dissertation, University of Arizona.

Gansemer-Topf, A. M., Ross, L. E., & Johnson, R. M. (2006). Graduate and professional student developmentand student affairs. New Directions for Student Services, 2006(115), 19–30.

Gardner, S. K., & Barnes, B. J. (2007). Graduate student involvement: socialization for the professional role.Journal of College Student Development, 48(4), 369–387.

Golde, C. M. (2000). Should I stay or should I go? Student descriptions of the doctoral attrition process. TheReview of Higher Education, 23(2), 199–227.

Educ Asse Eval Acc (2015) 27:223–251 249

Page 28: Beyond instrumentation

Hardré, P. L. (2012a). Scalable design principles for TA development: lessons from research, theory, testingand experience. In G. Gorsuch (Ed.), Working theories for teaching assistant and international teachingassistant development (pp. 3–38). Stillwater: NewForums.

Hardré, P. L. (2012b). Teaching assistant development through a fresh lens: a self-determination theoryframework. In G. Gorsuch (Ed.), Working theories for teaching assistant and international teachingassistant development (pp. 113–136). Stillwater: NewForums.

Hardré, P. L., & Burris, A. (2011). What contributes to TA development: differential responses to key designfeatures. Instructional Science, 40(1), 93–118.

Hardré, P. L., & Chen, C. H. (2005). A case study analysis of the role of instructional design in thedevelopment of teaching expertise. Performance Improvement Quarterly, 18(1), 34–58.

Hardré, P. L., & Chen, C. H. (2006). Teaching assistants learning, students responding: process,products, and perspectives on instructional design. Journal of Graduate Teaching AssistantDevelopment, 10(1), 25–51.

Hardré, P. L., Crowson, H. M., & Xie, K. (2010a). Differential effects of web-based and paper-basedadministration of questionnaire research instruments in authentic contexts-of-use. Journal ofEducational Computing Research, 42(1), 103–133.

Hardré, P. L., Nanny, M., Refai, H., Ling, C., & Slater, J. (2010b). Engineering a dynamic science learningenvironment for K-12 teachers. Teacher Education Quarterly, 37(2), 157–178.

Hardré, P. L., Beesley, A., Miller, R., & Pace, T. (2011). Faculty motivation for research: across disciplines inresearch-extensive universities. Journal of the Professoriate, 5(2), 35–69.

Hardré, P. L., Crowson, H. M., & Xie, K. (2012). Examining contexts-of-use for online and paper-basedquestionnaire instruments. Educational and Psychological Measurement, 72(6), 1015–1038.

Hegarty, N. (2011). Adult learners as graduate students: underlying motivation in completing graduateprograms. Journal of Continuing Higher Education, 59(3), 146–151.

Hephner LaBanc, B. (2010). Student affairs graduate assistantships: an empirical study of the perceptions ofgraduate students’ competence, learning, and professional development. Doctoral dissertation, NorthernIllinois University.

Higher Education Research Institute (HERI) (2012). Faculty satisfaction survey. http://www.heri.ucla.edu/index.php.Accessed 15 June 2013

Hyun, J., Quinn, B. C., Madon, T., & Lustig, S. (2006). Needs assessment and utilization of counselingservices. Journal of College Student Development, 47(3), 247–266.

Kanan, H. M., & Baker, A. M. (2006). Student satisfaction with an educational administrationpreparation program: a comparative perspective. Journal of Educational Administration, 44(2),159–169.

Kenner, C., & Weinerman, J. (2011). Adult learning theory: applications to non-traditional college students.Journal of College Reading and Learning, 41(2), 87–96.

Lipschultz, J. H., & Hilt, M. L. (1999). Graduate program assessment of student satisfaction: a method formerging university and department outcomes. Journal of the Association for CommunicationAdministration, 28(2), 78–86.

Lovitts, B. E. (2001). Leaving the ivory tower: the causes and consequences of departure from doctoral study.Lanham: Rowman & Littlefield.

Messick, S. (1995). Standards of validity and the validity of standards in performance assessment. EducationalMeasurement: Issues and Practice, 14(4), 5–8.

Nesheim, B. E., Guentzel, M. J., Gansemer-Topf, A. M., Ross, L. E., & Turrentine, C. G. (2006). If you wantto know, ask: assessing the needs and experiences of graduate students. New Directions for StudentServices, 2006(115), 5–17.

Offstein, E. H., Larson, M. B., McNeill, A. L., & Mwale, H. M. (2004). Are we doing enough fortoday’s graduate student? The International Journal of Educational Management, 18(6/7),396–407.

Patton, M. Q. (2012). Essentials of utilization-focused evaluation. Thousand Oaks: Sage.Schlossberg, N. K., Waters, E. B., & Goodman, J. (1995). Counseling adults in transition: kinking practice

with theory (2nd ed.). New York: Spring.Schram, L. N., & Allendoerfer, M. G. (2012). Graduate student development through the scholarship of

teaching and learning. Journal of Scholarship of Teaching and Learning, 12(1), 8–22.Smallwood, S. (2004). Doctor dropout. Chronicle of Higher Education, 50 (19), A10. Retrieved from: http://

chronicle.com/article/Doctor-Dropout/33786Stone, C., van Horn, C., & Zukin, C. (2012). Chasing the American Dream: recent college graduate and the

great recession. New Brunswick: John J. Heldrich Center for Workforce Development.

250 Educ Asse Eval Acc (2015) 27:223–251

Page 29: Beyond instrumentation

US Department of Education, National Center for Education Statistics. (2005). Integrated post-secondaryeducation data system, Fall 2004. Washington: US Department of Education.

Vincente, P., & Reis, E. (2010). Using questionnaire design to fight nonresponse bias in web surveys. SocialScience Computer Review, 28(2), 251–267.

Weidman, J. C., Twale, D. J., & Stein, E. L. (2001). Socialization of graduate and professional students inhigher education: a perilous passage? San Francisco: Jossey-Bass.

Williams-Tolliver, S. D. (2010). Understanding the experiences of women, graduate student stress, and lack ofmarital/social support: a mixed method inquiry. Doctoral dissertation, Capella University.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluationstandards: a guide for evaluators and evaluation users. Los Angeles: Sage.

Educ Asse Eval Acc (2015) 27:223–251 251