Evaluating technology-enhanced learning: A comprehensive framework

Background : The absence of a standard, comprehensive approach to evaluating technology-enhanced learning (TEL) limits the utility of individual evaluations, and impedes the integration and synthesis of results across studies. Purpose : To outline a comprehensive framework for approaching TEL evaluation in medical education, and to develop instruments for measuring the perceptions of TEL learners and instructors. Methods and results : Using both theoretical constructs of inquiry in education and a synthesis of existing models and instruments, we outlined a general model for evaluation that links utility, principles, and practices. From this we derived a framework for TEL evaluation that identifies seven data collection activities: needs analysis; documentation of processes, decisions, and final product; usability testing; observation of implementation; assessment of participant experience; assessment of learning outcomes; and evaluation of cost, reusability, and sustainability. We then used existing quality standards and approaches to develop instruments for assessing the experiences of learners and instructors using TEL. Conclusions : No single evaluation is likely to collect all of this information, nor would any single audience likely find all information elements equally useful. However, consistent use of a common evaluation framework across different courses and institutions would avoid duplication of effort and allow cross-course comparisons.


Introduction
Although we have both previously tackled the challenges of evaluation in medical education (Ellaway 2006(Ellaway , 2010aCook 2010), we have observed ongoing challenges in the review and evaluation of technology-enhanced learning (TEL), both in practice and in the literature. These challenges include a lack of comprehensiveness or a clear focus on the function of the evaluation, unclear relationships between what happened and what was evaluated, and the absence of a sound conceptual grounding for the subject or the methods of the evaluation. Previous authors in medical education have proposed guidelines (Atkins & O'halloran 1995;Glenn 1996;Olson & Shershneva 2004) and specific instruments (Knight et al. 2004;Alyusuf et al. 2013) for judging the quality of computerbased learning materials. Standards and benchmarks for online instruction have also been proposed for education generally (Merisotis & Phipps 2000;Oliver 2000;Moore 2005; Quality Matters Program 2014). However, none of these instruments or frameworks offers a comprehensive view of the purposes, functions, and specific activities required to fully evaluate a TEL intervention. The absence of a standard, comprehensive evaluation approach limits the utility of individual evaluations, and impedes the integration and synthesis of results across studies.
Our purpose is to outline a comprehensive framework for approaching TEL evaluation in medical education. We describe the development of this framework grounded both in theoretical constructs of inquiry in education and in a synthesis of existing practices and instruments. We further explain how it can be adapted to different contexts and uses in medical education, and illustrate its application in developing novel instruments for measuring the perceptions of learners and instructors as participants in a TEL course. Although we use the term ''course'' to refer to the unit of instruction being evaluated, this framework could equally apply to brief, focused TEL modules or multi-course programs.
Our primary intent is to support practicing teachers and program managers who need to evaluate TEL courses, although we also hope to support broader academic inquiry by encouraging greater standardization of TEL evaluation within the field of medical education. If investigators employ the same instrument (or minor context-specific variations thereon) it will facilitate meaningful comparisons across institutions, learners, courses, and instructional designs.

A general model for comprehensive evaluation
Evaluation is the process of judging the value of something. Although it has theoretical dimensions, evaluation is a largely practical undertaking. Evaluators need to understand what is being evaluated, how it is expected to work, what assumptions guide their approach, and how the evaluation results will be used (Frye & Hemmer 2012). Figure 1 illustrates a progression from needs (purpose and guiding questions), through principles (conceptual model or approach to inquiry), to practice (specific elements of desired information, and evaluation activities to collect this information) in a comprehensive evaluation. We have used this model in developing and describing our framework.
The importance of evaluating technology-enhanced learning TEL is a broad field that covers all uses of digital technology to support and mediate educational activities (Goodyear & Retalis 2010). TEL can involve web-based technologies, mobile devices and apps, computers, tablets, and other digital devices, and may include activities that are entirely digitally-mediated or those that integrate technology into hybrid or blended activities. TEL may be collaborative or self-directed, didactic or practice-based, and involve formative or summative assessment. Learners may work from home, a library, or a patient's bedside. Most modern medical education activities involve at least some component of TEL, and as such the evaluation of TEL has become an essential -if underdeveloped -part of medical education practice.
The practice of program evaluation in medical education has been previously described, in particular the role of the evaluator (Goldie 2006) and the importance of using evaluation findings to inform the ongoing development of programs (Wall 2010). Yet, we see a need to focus on the evaluation of TEL in medical education for the following reasons: (1) There is often great anxiety about new models of medical education. Using new technologies tends to amplify this anxiety, especially given the investments in infrastructure and faculty development required for their successful integration. Responding to these stakeholder concerns necessitates rigorous evaluation of TEL activities.
(2) Aspects of TEL such as usability, accessibility, and technical reliability of materials and learning environment are often overlooked in mainstream evaluation practice but play a much more substantial role in TEL evaluation. (3) Because interactions between teachers and learners in TEL tend to be different (i.e. mediated by technology), teacher-learner relationships do not yield the same opportunities for informal evaluation and feedback as do traditional modalities. This suggests that TEL requires a more comprehensive evaluation approach than other educational activities. (4) TEL can generate much more data (and different kinds of data) than traditional educational approaches. The development of educational analytics and ''big data'' analysis techniques affords new approaches to supporting evaluation, both procedurally and conceptually (Ellaway et al. 2014b). (5) Emerging discourses in TEL theory and practice distinguish between evaluating an educational technology in its own right and evaluating it in different contexts of use (Ellaway et al. 2014a). A greater precision in what exactly is being evaluated (and how) leads to different lines of inquiry. There are clear differences between evaluating TEL as a technology (is it reliable, safe, aligned to local environment, and sustainable?), evaluating its educational content (is it accurate, current, and in adherence to principles of effective learning?), and evaluating it as part of an overarching educational activity (did it achieve overall objectives and outcomes?) (Ellaway 2014).

Needs, uses, and guiding questions
The first step in planning any evaluation activity is to ask: ''Who is the intended audience?", and ''What is the audience likely to do with the information the evaluation provides?'' (Cook 2010). These questions clarify the evaluation's purpose, and guide subsequent decisions about data collection, analysis, and presentation. Box 1 summarizes several of the evaluation frameworks noted below.

Uses
An evaluation might serve one audience and use, or many. Identifying specific uses and designing the evaluation to meet these from the outset will focus the evaluation, clarifying the information required and analyses to be conducted. For example, an evaluation might require different data collection, analysis, and presentation if intended for medical students (e.g. to inform their choice of school or selection of elective courses), instructors (e.g. to help them improve a course for the next offering), or a funding agency (e.g. to determine the course's impact). Clarifying the intended use early on will help to save time and trouble later in the process. The purposes of an evaluation can be broadly classified as summative or formative. Summative evaluation occurs at or after the end of a course, and renders a final judgment such as, ''How well did it work?'' Formative evaluation typically occurs at various points before, during, and after a course, and aims to collect information to improve a course as it is running and in future iterations.
It is also helpful to consider how an educational intervention influences outcomes for, or is experienced by, stakeholders at the individual participant level, the group or class level, and the institutional, professional or societal level. At the level of the individual learner, cognitive events might be the primary consideration: what did they experience or learn? At the level of the group, observable activity and aggregate performance might be the foci (e.g. what went on amongst the learners and between learners, instructors, technologies, and other materials; and how well did they perform as a whole?). At the level of the institution or society we might ask about the context that gave rise to the intervention, and the value or impact on people and systems outside the classroom (e.g. the contribution to the mission of a school or to the goals of a national initiative).

Questions
Asking good questions to guide the process of inquiry is a core feature of good evaluation. Without a focused question it is difficult to judge whether an evaluation has accomplished its goals.
We can classify questions in terms of description (''What was done?''), justification (''Did it work and at what cost?''), and clarification (''How or why did it work, and how can it be improved?'') (Cook et al. 2008).Clarification questions can be further classified as those that seek explanation (''How or why did it work, or how was it experienced?'') and those that suggest experimentation (''How can it be improved?''). We can also ask normative questions such as ''What should have happened?'', which we might label ''judgment''. This suggests a five-domain model defining key questions for planning an evaluation (i.e. description, justification, clarification-explanation, clarification-experimentation, and judgment).

General principles of evaluation
Cook (2010) described three broad approaches or orientations for evaluation.
Objectives-oriented approaches focus on how well a priori objectives were met. Course objectives and corresponding data might include enrollment and completion numbers, learning outcomes (knowledge, skills, attitudes), and net income and expenditures. The objectives-oriented approach is relatively straightforward in implementation, but tends to be poorly suited to capturing unexpected developments (good or bad) in the course. For example, we might discover that learning outcomes were poor, or that completion rates were unexpectedly high, and yet be unable to determine why this occurred based solely on objectives-related data.
Participant-oriented approaches typically use qualitative, and occasionally quantitative, methods to collect data from multiple sources and inductively explore not only what happened, but why it happened and how it was experienced. This approach is responsive to unexpected events, and flexible enough to capture the complexity of large programs and local contexts. Disadvantages include the time, resources, and expertise required, and the subjective and context-specific nature of interpretations. Process-oriented evaluations consider the entire lifecycle of a course or program, from the inception of the idea, through implementation and delivery, to summative judgments about quality and decisions about future iterations. This would ideally include an evaluation of the need for the course or course update, course planning and implementation, resources required, and course outcomes. The process-oriented approach typically Box 1. Glossary of evaluation frameworks cited in this article.
CIPP framework: a model for program evaluation organized around the themes of Context (needs, assets, and opportunities that prompted the change and the desired goals), Inputs (alternative approaches, feasibility, and cost-effectiveness), Processes (activities and issues during development and implementation), and Products (short-and long-term outcomes, sustainability, and transportability to new settings) (Stufflebeam 2003). Description-Justification-Clarification framework: a framework for classifying the purpose of education research, first described by Henk Schmidt at the 2005 meeting of the Association for Medical Education in Europe, and then further developed and formalized by Cook et al. (2008). Description focuses on what was done; justification focuses on whether it worked; and clarification focuses on why or how it worked (or failed to work) and how it can be improved. Evaluation Cookbook: a ''practical guide to evaluation methods for lecturers'' that suggests specific evaluation ''recipes'' (methods for collecting, analyzing, and reporting information) for various situations in education; see www.icbl.hw.ac.uk/ltdi/cookbook/. Kirkpatrick framework: an evaluation framework for training programs organized around four levels: level 1 reactions, level 2 learning, level 3 behaviors, and level 4 results (Kirkpatrick 1996). Objectives-oriented evaluation: an evaluation approach that focuses on how well a priori objectives were met. The objectives-oriented approach is relatively straightforward in implementation, but tends to be poorly suited to capturing unexpected developments (good or bad) in the course. Participant-oriented evaluation: an evaluation approach that inductively explores not only what happened, but why it happened and how it was experienced; this approach typically uses qualitative data and occasionally quantitative methods to collect data from multiple sources. This approach is responsive to unexpected events, but interpretations are typically subjective and context-specific. Process-oriented evaluation: an evaluation approach that considers the entire lifecycle of a course or program, from the inception of the idea, through implementation and delivery, to summative judgments about quality and decisions about future iterations. The CIPP model represents a prototypical example of process-oriented evaluation. Quality Matters evaluation program: an international organization focused on quality assurance of online learning, see www.qualitymatters.org. SLOAN Consortium: an international organization focused on quality online learning (now the Online Learning Consortium), see www.onlinelearningconsortium.org. SWOT: a planning and evaluation framework based on the strengths, weaknesses, opportunities and threats associated with a particular activity or program; the precise origins and first description of this framework remain obscure.
Evaluating technology-enhanced learning includes elements of both objectives-oriented and participant-oriented evaluation, and can be very comprehensive. However, this approach is also relatively resource-intensive, and must be initiated early in the sequence of course development. The choice of approach depends in large part upon the anticipated uses and corresponding guiding questions. Combining approaches is often useful. For example, the evaluator might use test scores to determine the achievement of objectives, but also use a participant-oriented approach to identify unplanned events and understand the root cause of unexpected outcomes. The Context-Inputs-Processes-Products (CIPP) model (Stufflebeam 2003) is a common evaluation approach that emphasizes the process (and objectives) orientation, and also accommodates a participant orientation if desired. Context addresses the needs, assets, and opportunities that prompted the change and the desired goals. Inputs consider alternative approaches, feasibility, and cost-effectiveness. Processes focus on the actual development and implementation. Products focus on both short-and long-term outcomes as well as the sustainability and transportability (i.e. practicability of adoption elsewhere) of the intervention.
Many (but not all) educational evaluations explore quantitative outcomes. Such outcomes can be classified using the hierarchy developed by Kirkpatrick (1996). Level 1 outcomes (Reactions) look at participant satisfaction with and perceptions of the course experience and quality. These are often based on post-event self-report surveys. Level 2 (Learning) considers participants' knowledge, skills, or attitudes in a test setting, such as a multiple-choice test of knowledge, or a skill exam using a standardized patient or a virtual reality simulator. Level 3 outcomes (Behaviors) measure performance in actual practice, such as test ordering patterns or direct observation of patient care. Level 4 outcomes (Results) assess the impact on the systems and organizations within which participants work, such as changes in patients' health or the effectiveness or efficiency of healthcare systems. However, Yardley and Dornan (2012) observe that ''different levels concern different beneficiaries'' and omit others altogether (such as teachers), suggesting that Kirkpatrick-type outcomes alone will be inadequate for many contexts and applications.
Many course outcomes cannot be readily quantified. Nonquantitative outcomes such as narrative feedback about the participant experience (both learner and instructor), unplanned events (both favorable and unfavorable), and required deviations from the original implementation plan can be captured using qualitative research approaches. A holistic approach to evaluation will often employ both quantitative and qualitative measures.

Principles into practice: Specific activities to collect evaluation data
Having considered the high level organizing principles for evaluating TEL, we turn now to practical approaches to designing evaluations of TEL events. Evaluation requires information -information that can be analyzed, interpreted, and acted upon. In a practical sense, then, the evaluator might ask: ''What information should be collected to help answer my guiding questions?'' (i.e. description, justification, or clarification), followed by, ''How can I collect this information?'' To translate the core principles identified in the previous section into practical actions we mapped these high-level questions with the process-oriented CIPP model (see Table 1), and used an iterative consensus-building process between the two authors to identify specific elements of information that a TEL evaluation might explore.
After creating this comprehensive map of useful information elements, we again used an iterative process to identify specific evaluation activities that can generate the information required. This included setting out seven broad areas of evaluation activity, summarized in the sections that follow and  Figure 1) linking needs and questions to principles and approaches, resulting in the identification of specific elements of information. b Questions of judgment (''What should have been done?'') constitute a secondary level of evaluation, and potentially draw upon several elements of the other questions leading to a synthesis of information and comparison with the ideal.
in Table 2. We emphasize those activities that are relatively unique to TEL, or for which application to TEL requires special considerations.

Conduct needs analysis and environmental scan
Although many evaluations begin at the completion of an instructional program, a comprehensive evaluation (such as the CIPP model) actually begins when a new program or change to an existing program is first conceived. A needs analysis in the formative stages helps to clearly identify the need for change, determine whether educational change (as contrasted with adjustments to other components in the healthcare system) is the correct solution, and confirm that resources can be marshaled to implement this change (Cook & Dupras 2004).
The need for change can be determined by considering organizational/societal needs and capacities (e.g. specific knowledge and skills required by a school or hospital, or emerging regulatory requirements, political trends, and economic policies), occupational needs (the roles and specific capabilities required to meet new or existing organizational needs), and individual needs (the target group's present performance, work environment, and ability to participate in training) (Kern et al. 1998;Training and Executive Development Group 2014). These needs should be contrasted with educational experiences and resources already existing or planned, including formal courses, clinical experiences, online knowledge resources, peers, and faculty; and with available technology resources (infrastructure and human expertise).
This information can be collected in a myriad of ways, including objective documentation of learner performance (e.g. formal tests of knowledge or skill, Estimation of cost Estimation of preparation for maintenance (sustainability) and likelihood of repurposing (reusability) 7. Estimate cost, reusability, and sustainability a This table illustrates the translation of general principles and approaches (desired elements of information) into specific evaluation activities and instruments (Step 2 in Figure 1). Note that the Information Elements are derived directly from Table 1. b Numbers in this column correspond to those used in the main text when discussing each evaluation activity. c K1-K4 refer to levels in the Kirkpatrick evaluation framework (K1, reaction; K2, learning; K3, behaviors; K4, results). direct observation of clinical performance, learner self-assessments, or chart audits), surveys, interviews or focus groups, and critical reviews of existing offerings. A group-level ''SWOT '' analysis can be helpful in identifying organizational Strengths, Weaknesses, Opportunities, and Threats.
Ultimately, a need for change can be justified by identifying a gap in the curriculum (i.e. inadequate depth or breadth in the specific clinical topic) or a gap in the instructional approach (i.e. failure of existing instructional strategies to optimally facilitate learning or reach desired learners).

Document processes, decisions, and final product
The development of a course typically blends formal instructional design models (such as 4CID [van Merriënboer et al. 2002], conversational exchange [Laurillard 2001], or significant learning [Fink 2003]) and instructional design processes (such as the Analyze, Design, Develop, Implement, and Evaluate (ADDIE) framework (Morrison et al. 2010)) with an organic response to local needs and culture. A comprehensive evaluation will carefully document each activity and decision in this process. This typically takes the form of a narrative record of the steps taken (drawing on sources such as diaries or meeting notes), the people involved and other resources required, and the nature of the resulting course. The real cost of developing and implementing TEL courses is rarely captured, but should be; see further discussion of this point below.
Specific processes or milestones that might be documented include the development of specific goals and objectives, the selection of specific instructional approaches, and the selection and configuration of specific technologies. It is also useful to document viable alternatives (e.g. objectives and instructional approaches not incorporated into the final model), and the reasons (e.g. theory, evidence, costs) and means for their exclusion.

Test usability
Usability focuses on the educational qualities of all the resources and tools used in a course -those things that Engeström (1987) calls ''mediating artifacts'' in an educational activity. Usability is likely important in all educational activities, but it is particularly relevant to TEL given its inherent dependence on technological mediating artifacts. Usability evaluation considers both the ease with which the user can perform desired activities when the technology functions as intended, and also identifies errors in function or content (Krug 2000;Nielsen 2012). Nielsen identified five key elements of usability (2012): Learnability refers to how easily users can accomplish basic tasks the first time they encounter the technology. Efficiency looks at how well users can perform tasks once they have learned the steps. Memorability refers to how easily learners can reestablish proficiency when they return to the technology after a period of not using it. Errors, in terms of their number, severity, and ease of recovery from, are a fourth key element. Satisfaction (of users) constitutes the final element. Usability testing may also consider conformance to evidence-based standards for the design of instructional technologies, such as Mayer's multimedia principles (Mayer 2005).
Most usability tests result in changes to the system. These should be documented (see above), and the iterative cycle of testing and improvement should continue until usability has reached an acceptable level. However, aspiring to create a flawless course may be counterproductive (Cook 2014). In many instances, a developer gains much from releasing a TEL intervention relatively early (the ''minimal viable product''), carefully and deliberately evaluating the result, and then quickly making evidence-based improvements (Ries 2011).

Document key events during implementation and final product
Documentation continues into implementation, now focusing on objective accounts of what actually happens. This may be a normative account that checks for intentional or accidental variances from the intended execution, and the steps taken to resolve such variances, or it may be an unstructured observation of events as they occur. The final product (after any changes) should be documented carefully.
Although this Documentation activity is about objectively observed events (as compared with the subjective experiences of participants described in the next section), those doing the documenting constitute an important part of the process. Different people may have access to different information, notice different things, or interpret the same thing in different ways. A member of the instructor team would usually perform this function, but in some cases (e.g. distance learning) learners may need to be enlisted as reporters of key events.

Assess participant experience and satisfaction
This activity focuses on the subjective experiences of participants in a TEL activity, the value and meaning they attribute to those experiences, and their perception of the quality of the materials or tools they used. Educational evaluations commonly address this activity, which includes Kirkpatrick level 1 (Reaction) outcomes. Although experiences are also documented in other activities (as above), the focus here is on the subjective opinions of participants rather than the objective recording of events or of actual learning outcomes. Although learners are typically the focus of this evaluation activity, participants also include instructors and other stakeholders such as administrators, support staff, and real or standardized patients; these will provide useful information for many evaluations. Most participant experiences are assessed through self-report surveys or interviews/focus groups.

Assess learning outcomes
As previously noted, Kirkpatrick levels 2, 3 and 4 focus on the objective outcomes of training activities (Kirkpatrick 1996). These arguably constitute the most critical element in determining the efficacy and effectiveness of an educational intervention. The link between the training event and the observed outcome becomes more difficult to establish as one progresses from level 2 to 3 to 4 (Cook & West 2013), yet the perceived meaningfulness and value also rises, emphasizing the need for thoughtful balance in selecting outcomes and the associated measurement instruments. Since the development and validation of level 2-4 outcomes have been extensively explored in other sources (Kirkpatrick 1996;Case & Swanson 2001;Cook & Beckman 2006;Kogan et al. 2009;Schuwirth & van der Vleuten 2011;Pangaro & ten Cate 2013), we will not discuss these further.

Estimate cost, reusability, and sustainability
One of the most neglected (Zendejas et al. 2013), yet arguably one of the most important (Sandars 2010), aspects of evaluation focuses on the financial, personnel, facilities, and other resource costs required to develop, implement, and maintain a TEL course (Clune 2002;Hummel-Rossi & Ashdown 2002). Approaches such as Levin's ''ingredients'' model (Levin 2001) and the ''total cost of ownership'' (TCO) model (Ellaway 2010b) estimate the true cost of an educational activity by identifying and valuing each component, including costs related to equipment and materials, licensing, personnel, facilities and infrastructure, learner expenses (such as transportation or meals), lost opportunities (e.g. lost clinical revenue), and stopping an existing activity or moving it from one medium to another (for instance when the hosting technology changes). A careful evaluation of costs can inform secondary analyses of cost-effectiveness and return on investment (ROI) (Cook 2014), and also help determine course sustainability.
Repeating and repurposing all or part of a TEL course (offering a course a second time, or reusing part of one course in another course) could potentially reduce the per-learner cost, yet in practice such economies may be less easily achieved than anticipated (Cook & Triola 2014). In addition to exploring the results of actual repeated offerings, an evaluation might seek information that predicts the potential for reuse or that indicates preparation for anticipated maintenance, updates, and integration with other educational endeavors (Glasgow et al. 1999).

New instruments to assess participant perceptions
To facilitate practical application of this model, we present three generic instruments for capturing the perceptions and experiences of learners and instructors following their participation in a TEL course: the Evaluation of Technology-Enhanced Learning Materials for Learner Perceptions (ETELM-LP) and for Instructor Perceptions (ETELM-IP), and a shortened version of the ETELM-LP for use with very brief courses (ETELM-LP-S); see online supplemental materials. Such assessments comprise an important element of virtually all course evaluations, and as such represent a recurrent need for nearly all TEL educators. Moreover, consistent use of the same instrument across different courses, programs, institutions, and time periods will not only avert the reinvention of a new instrument for each occasion, but will also permit crosscourse comparisons.
We developed the ETELM instruments both deductively and inductively. We initially identified the desired information (salient domains) and then specific items. We first identified two frameworks widely accepted in mainstream education (i.e. Quality Matters [Quality Matters Program 2014] and the SLOAN Consortium [Moore 2005]) that specify key elements of quality in an online course, and used these to deductively identify key domains and tentative items within each domain. We then inductively verified the completeness of these frameworks by identifying existing instruments and frameworks for evaluating TEL, and examining these for new domains or items not reflected in the initial two frameworks. We searched PubMed (using search strings such as ''evaluat* AND (online OR Internet OR multimedia) AND (instrument OR valid*) AND medical education'') for relevant work in health professions education. We found one instrument that had been subjected to formal validation efforts (Alyusuf et al. 2013), and other articles describing general frameworks or listing specific areas for evaluation (Atkins & O'halloran 1995;Glenn 1996;Knight et al. 2004). Looking to education beyond medicine, we identified (in a non-systematic Internet search) 10 other instruments developed by academic institutions for local use (see online supplemental materials for listing). For each instrument thus identified we matched the key domains against those in the initial frameworks, and then used a thematic synthesis to identify common domains and reduce redundancy, resulting in a parsimonious list of domains that captured all of the key domains of the original instruments and frameworks. We then referred back to the original frameworks and instruments to identify specific items/questions within each domain. Finally, three non-author experts (two with expertise in online learning and one professional evaluator) reviewed the questionnaires and offered suggestions. We pilot tested the instrument with 16 students in graduate-level clinical research courses. All agreed the questions were easy to understand, applicable, and non-redundant, and 15/16 agreed that the length was appropriate.
The ETELM instruments exemplify both the principles and the approach set out in this article. They may be adapted to Evaluating technology-enhanced learning context-and situation-specific needs by adding new items while retaining the generic core. We note that only some of the questions directly refer to technology issues. This reflects that while technology is an essential mediating factor in TEL, evaluation should focus on the whole educational activity.

A simple TEL evaluation plan
We acknowledge that most educators are not trained as evaluators, and may have difficulty planning a comprehensive evaluation (Oliver 2000). One potential solution to this challenge is the use of evaluation ''recipes'' (Harvey 1998)brief outlines of evaluation plans designed to address a given audience and need. To this end, we propose a basic ''recipe'' for TEL evaluation (Box 2) that draws upon four of the seven evaluation activities defined previously, including the ETELM-LP and ETELM-IP. Although we believe this simple plan will likely meet the minimal needs of many TEL educators, we also realize that it may be inadequate for many contexts and trust that more experienced evaluators will define their own plans. The Evaluation Cookbook (available at http://www.icbl.hw. ac.uk/ltdi/cookbook/) lists additional evaluation recipes.

Discussion
We have outlined a general model for implementing evaluation, developed a novel framework for TEL evaluation in medical education, described how the framework can be adapted to different contexts and applications in medical education, identified seven specific evaluation activities, and developed instruments for measuring the perceptions of learners and instructors after a TEL course. Inasmuch as participants' perceptions comprise an important element of virtually all course evaluations, these instruments may help fill a need recurrently encountered by nearly all TEL educators.
Moreover, consistent use of the same instrument across different courses, programs, institutions, and time periods has the potential to not only avert the reinvention of a new instrument for each occasion, but to allow cross-course comparisons.
We grounded our framework upon numerous prior models and instruments (Merisotis & Phipps 2000;Stufflebeam 2003;Knight et al. 2004;Moore 2005;Cook 2010;Ellaway 2010a;Alyusuf et al. 2013) and deliberately balanced the abstractions of academia, which foster rigor and comprehensiveness, with the practical needs of educators. Our primary intent is to guide the evaluation of specific TEL events within schools and agencies, although we also hope to favorably influence the quality of published scholarship in TEL.
In considering the applicability of this work, we note that our general model ( Figure 1) and many of the specific information elements (Table 1) and evaluation activities (Table 2) are not specific to the TEL context. Moreover, the ubiquity of TEL in contemporary medical education suggests that in many instances TEL evaluation is becoming synonymous with educational evaluation in general. As such, we believe this work has application beyond TEL.
We note as a limitation that we did not develop specific tools for conducting all of the proposed evaluation activities. We further acknowledge that the instruments we developed have not yet been substantially tested in practice. Although the rigorous development process provides strong evidence of useful content, it will be important to evaluate other sources of validity evidence for ETELM scores (such as relations with other variables) (Cook & Beckman 2006). Finally, this article focuses on planning the evaluation and collecting data. Analyzing data and preparing the evaluation report are also essential steps (Oliver 2000), but beyond our present scope.

Box 2. A minimal recipe for TEL evaluation a .
Audience and purpose 1. Who is the intended audience for the evaluation?

What will they do with this information?
Primary purpose: improve the course for next iteration. Secondary purposes: judge course effectiveness (did it work) and user experience.
Evaluation ingredients b : 1. Perform usability testing (if not too late) Usability testing does not require sophisticated technology or training (Krug 2000). A single user working at a standard computer, observed by an evaluator with a notepad and pen, can provide a lot of useful information (see http://www.nngroup.com/articles/usability-101-introduction-to-usability).

Document key elements of the final product
An archive and/or screenshots of the final course, together with a detailed written description of the course features and content, will prove invaluable down the road. It is impossible to improve a course if the key features of the original course are not clearly known. The course-as-planned may not necessarily match the course-as-delivered; ideally, one would document both. 3. Administer instruments to capture the perceptions of both students and instructors (e.g. the Evaluation of Technology-Enhanced Learning Materials instruments described herein) What the evaluator believes happened may be very different than the beliefs of the students or instructor (if not the evaluator). Gathering the different perspectives on what happened and what it meant to participants can be very informative. 4. Prepare and administer course-specific assessments of Kirkpatrick Level 2 outcomes (knowledge, skills, attitudes) At a minimum, a post-course assessment of learning is important. A pretest is often (but not always) helpful. Using the same assessment across different iterations of the course will allow evaluation of stability (if the course does not change) or comparisons of alternate instructional approaches (if the course changes over time).
a As elaborated in the text, this simple plan will be inadequate for many contexts (e.g. different audiences and needs). More experienced evaluators will likely wish to define their own plans. b Each of the evaluation activities in this recipe would ideally be conducted using a well-developed approach or instrument.
We have deliberately kept our focus fairly broad in order to meet the needs of multiple audiences and uses. It would likely be infeasible for any single evaluation to collect all of the suggested information, nor would any single audience likely to find all this information equally useful. As such, educators should use our comprehensive framework to identify the evaluation activities and specific information elements of greatest interest and utility to their target audience and context.

Glossary
Technology-Enhanced Learning (TEL): instructional events in which ''technology plays a significant role in making learning more effective, efficient or enjoyable'' (Goodyear & Retalis 2010, p 8). CIPP framework: A model for program evaluation organized around the themes of context, inputs, processes, and products (Stufflebeam 2003). Quality Matters Program: an international organization focused on the quality assurance of online learning, see www.qualitymatters.org.