Content deleted Content added
Sfgiants906 (talk | contribs) m Added links (Difference in Differences) Tags: Mobile edit Mobile app edit iOS app edit |
removed the |
||
(28 intermediate revisions by 20 users not shown) | |||
Line 1:
{{Short description|Assessment of particular interventions}}
{{external links|date=June 2017}}
'''Impact evaluation''' assesses the changes that can be attributed to a particular intervention, such as a project, program or policy, both the intended ones, as well as ideally the unintended ones.<ref>[http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/EXTPOVERTY/EXTISPMA/0,,menuPK:384336~pagePK:149018~piPK:149093~theSitePK:384329,00.html World Bank Poverty Group on Impact Evaluation], accessed on January 6, 2008</ref> In contrast to outcome monitoring, which examines whether targets have been achieved, impact evaluation is structured to answer the question: how would outcomes such as participants' well-being have changed if the intervention had not been undertaken? This involves counterfactual analysis, that is, "a comparison between what actually happened and what would have happened in the absence of the intervention."<ref>
Impact evaluation helps people answer key questions for evidence-based policy making: what works, what doesn't, where, why and for how much? It has received increasing attention in policy making in recent years in the context of both
== Counterfactual evaluation designs ==
Line 8 ⟶ 9:
[[Counterfactual conditional|Counterfactual]] analysis enables evaluators to attribute cause and effect between interventions and outcomes. The 'counterfactual' measures what would have happened to beneficiaries in the absence of the intervention, and impact is estimated by comparing counterfactual outcomes to those observed under the intervention. The key challenge in impact evaluation is that the counterfactual cannot be directly observed and must be approximated with reference to a comparison group. There are a range of accepted approaches to determining an appropriate comparison group for counterfactual analysis, using either prospective (ex ante) or retrospective (ex post) evaluation design. Prospective evaluations begin during the design phase of the intervention, involving collection of baseline and end-line data from intervention beneficiaries (the 'treatment group') and non-beneficiaries (the 'comparison group'); they may involve selection of individuals or communities into treatment and comparison groups. Retrospective evaluations are usually conducted after the implementation phase and may exploit existing survey data, although the best evaluations will collect data as close to baseline as possible, to ensure comparability of intervention and comparison groups.
There are five key principles relating to internal validity (study design) and external validity (generalizability) which rigorous impact evaluations should address: confounding factors, [[selection bias]], spillover effects, contamination, and impact heterogeneity.<ref>{{cite web|url=http://www.3ieimpact.org/media/filer/2012/04/20/principles-for-impact-evaluation.pdf|title=Log in|
* '''Confounding''' occurs where certain factors, typically relating to socioeconomic status, are correlated with exposure to the intervention and, independent of exposure, are causally related to the outcome of interest. Confounding factors are therefore alternate explanations for an observed (possibly spurious) relationship between intervention and outcome.
Line 16 ⟶ 17:
* '''Impact heterogeneity''' refers to differences in impact due by beneficiary type and context. High quality impact evaluations will assess the extent to which different groups (e.g., the disadvantaged) benefit from an intervention as well as the potential effect of context on impact. The degree that results are generalizable will determine the applicability of lessons learned for interventions in other contexts.
Impact evaluation designs are identified by the type of methods used to generate the counterfactual and can be broadly classified into three categories – experimental, quasi-experimental and non-experimental designs – that vary in feasibility, cost, involvement during design or after implementation phase of the intervention, and degree of selection bias. White (2006)<ref name="worldbank.org">
==
{{further|Experimental design}}
Under experimental evaluations the treatment and comparison groups are selected randomly and isolated both from the intervention, as well as any interventions which may affect the outcome of interest. These evaluation designs are referred to as [[Randomized controlled trial|randomized control trials]] (RCTs). In experimental evaluations the comparison group is called a [[control group]]. When randomization is implemented over a sufficiently large sample with no contagion by the intervention, the only difference between treatment and control groups on average is that the latter does not receive the intervention. Random sample surveys, in which the sample for the evaluation is chosen randomly, should not be confused with experimental evaluation designs, which require the random assignment of the treatment.
The experimental approach is often held up as the 'gold standard' of evaluation. It is the only evaluation design which can conclusively account for selection bias in demonstrating a causal relationship between intervention and outcomes. Randomization and isolation from interventions might not be practicable in the realm of social policy and may be ethically difficult to defend,<ref name="auto">{{cite journal|last=Martin|first=Ravallion|date=1 January 2009|title=Should the Randomistas Rule?|url=http://ideas.repec.org/a/bpj/evoice/v6y2009i2n6.html
=== Randomised control trials (RCTs) ===
===Quasi-experimental design===▼
RCTs are studies used to measure the effectiveness of a new intervention. They are unlikely to prove causality on their own, however randomisation reduces bias while providing a tool for examining cause-effect relationships.<ref>{{Cite journal|last1=Hariton|first1=Eduardo|last2=Locascio|first2=Joseph J.|date=December 2018|title=Randomised controlled trials—the gold standard for effectiveness research|journal=BJOG: An International Journal of Obstetrics and Gynaecology|volume=125|issue=13|pages=1716|doi=10.1111/1471-0528.15199|issn=1470-0328|pmc=6235704|pmid=29916205}}</ref> RCTs rely on random assignment, meaning that that evaluation almost always has to be designed ''ex ante'', as it is rare that the natural assignment of a project would be on a random basis.<ref name=":1">{{cite journal|last=White|first=Howard|date=8 March 2013|title=An introduction to the use of randomised control trials to evaluate development interventions|journal=Journal of Development Effectiveness|volume=5|pages=30–49|doi=10.1080/19439342.2013.764652|s2cid=51812043|doi-access=free}}</ref> When designing an RCT, there are five key questions that need to be asked: What treatment is being tested, how many treatment arms will there be, what will be the unit of assignment, how large of a sample is needed, how will the test be randomised.<ref name=":1" /> A well conducted RCT will yield a credible estimate regarding the average treatment effect within one specific population or unit of assignment.<ref name=":2">{{Cite web|last1=Deaton|first1=Angus|last2=Cartwright|first2=Nancy|date=2016-11-09|title=The limitations of randomised controlled trials|url=https://voxeu.org/article/limitations-randomised-controlled-trials|access-date=2020-10-26|website=VoxEU.org}}</ref> A drawback of RCTs is 'the transportation problem', outlining that what works within one population does not necessarily work within another population, meaning that the average treatment effect is not applicable across differing units of assignment.<ref name=":2" />
=== Natural experiments ===
Natural experiments are used because these methods relax the inherent tension uncontrolled field and controlled laboratory data collection approaches.<ref name=":3">{{Cite journal|last1=Roe|first1=Brian E.|last2=Just|first2=David R.|date=December 2009|title=Internal and External Validity in Economics Research: Tradeoffs between Experiments, Field Experiments, Natural Experiments, and Field Data|url=http://dx.doi.org/10.1111/j.1467-8276.2009.01295.x|journal=American Journal of Agricultural Economics|volume=91|issue=5|pages=1266–1271|doi=10.1111/j.1467-8276.2009.01295.x|issn=0002-9092}}</ref> Natural experiments leverage events outside the researchers' and subjects' control to address several threats to internal validity, minimising the chance of confounding elements, while sacrificing a few of the features of field data, such as more natural ranges of treatment effects and the presence of organically formed context.<ref name=":3" /> A main problem with natural experiments is the issue of replicability. Laboratory work, when properly described and repeated, should be able to produce similar results. Due to the uniqueness of natural experiments, replication is often limited to analysis of alternate data from a similar event.<ref name=":3" />
== Non-experimental approaches ==
▲=== Quasi-experimental design ===
[[Quasi-experiment]]al approaches can remove bias arising from selection on observables and, where panel data are available, time invariant unobservables. Quasi-experimental methods include matching, differencing, instrumental variables and the pipeline approach; they are usually carried out by multivariate [[regression analysis]].
Line 35 ⟶ 45:
[[Instrumental Variable|Instrumental variables]] estimation accounts for selection bias by modelling participation using factors ('instruments') that are correlated with selection but not the outcome, thus isolating the aspects of program participation which can be treated as exogenous.
The pipeline approach ([[
=== Non-experimental design ===
Non-experimental impact evaluations are so-called because they do not involve a comparison group that does not have access to the intervention. The method used in non-experimental evaluation is to compare intervention groups before and after implementation of the intervention. Intervention [[interrupted time-series]] (ITS) evaluations require multiple data points on treated individuals before and after the intervention, while before versus after (or pre-test post-test) designs simply require a single data point before and after. Post-test analyses include data after the intervention from the intervention group only. Non-experimental designs are the weakest evaluation design, because to show a causal relationship between intervention and outcomes convincingly, the evaluation must demonstrate that any likely alternate explanations for the outcomes are irrelevant. However, there remain applications to which this design is relevant, for example, in calculating time-savings from an intervention which improves access to amenities. In addition, there may be cases where non-experimental designs are the only feasible impact evaluation design, such as universally implemented programmes or national policy reforms in which no isolated comparison groups are likely to exist.
Line 55 ⟶ 65:
===Selection bias===
When there is an absence of the assumption of equivalence, the difference in outcome between the groups that would have occurred regardless creates a form of bias in the estimate of program effects. This is known as selection bias (Rossi et al., 2004). It creates a threat to the validity of the program effect estimate in any impact assessment using a non-equivalent group comparison design and appears in situations where some process responsible for influences that are not fully known selects which individuals will be in which group instead of the assignment to groups being determined by pure chance (Rossi et al., 2004). This may be because of participant self-selection, or it may be because of program placement (placement bias).<ref name=":0">{{Cite book|url=https://www.adb.org/sites/default/files/publication/392376/impact-evaluation-development-interventions-guide.pdf|title=Impact Evaluation of Development Interventions: A Practical Guide|
Selection bias can occur through natural or deliberate processes that cause a loss of outcome data for members of the intervention and control groups that have already been formed. This is known as attrition and it can come about in two ways (Rossi et al., 2004): targets drop out of the intervention or control group cannot be reached or targets refuse to co-operate in outcome measurement. Differential attrition is assumed when attrition occurs as a result of something either than explicit chance process (Rossi et al., 2004). This means that "those individuals that were from the intervention group whose outcome data are missing cannot be assumed to have the same outcome-relevant characteristics as those from the control group whose outcome data are missing" (Rossi et al., 2004, p271). However, random assignment designs are not safe from selection bias which is induced by attrition (Rossi et al., 2004).
Line 65 ⟶ 75:
====Secular trends or secular drift====
Secular trends can be defined as being relatively long-term trends in the community, region or country. These are also termed secular drift and may produce changes that enhance or mask the apparent effects of
====Interfering events====
Line 79 ⟶ 89:
== Estimation methods ==
Estimation methods broadly follow evaluation designs. Different designs require different estimation methods to measure changes in well-being from the counterfactual. In experimental and quasi-experimental evaluation, the estimated impact of the intervention is calculated as the difference in mean outcomes between the treatment group (those receiving the intervention) and the control or comparison group (those who don't). This method is also called randomized control trials (RCT). According to an interview with Jim Rough, former representative of the American Evaluation Association, in the magazine
Impact Evaluations which have to compare average outcomes in the treatment group, irrespective of beneficiary participation (also referred to as 'compliance' or 'adherence'), to outcomes in the comparison group are referred to as intention-to-treat (ITT) analyses. Impact Evaluations which compare outcomes among beneficiaries who comply or adhere to the intervention in the treatment group to outcomes in the control group are referred to as treatment-on-the-treated (TOT) analyses. ITT therefore provides a lower-bound estimate of impact, but is arguably of greater policy relevance than TOT in the analysis of voluntary programs.<ref>[http://www.eric.ed.gov/PDFS/ED493363.pdf Bloom, H. (2006) The core analytics of randomized experiments for social research. MDRC Working Papers on Research Methodology. MDRC, New York]</ref>
Line 85 ⟶ 95:
== Debates ==
While there is agreement on the importance of impact evaluation, and a consensus is emerging around the use of counterfactual evaluation methods, there has also been widespread debate in recent years on both the definition of impact evaluation and the use of appropriate methods (see White 2009<ref>
=== Definitions ===
The International Initiative for Impact Evaluation (3ie) defines rigorous impact evaluations as: "analyses that measure the net change in outcomes for a particular group of people that can be attributed to a specific program using the best methodology available, feasible and appropriate to the evaluation question that is being investigated and to the specific context".<ref>{{cite web|url=http://www.3ieimpact.org/media/filer/2012/04/20/principles-for-impact-evaluation.pdf|title=Log in|
According to the World Bank's DIME Initiative, "Impact evaluations compare the outcomes of a program against a counterfactual that shows what would have happened to beneficiaries without the program. Unlike other forms of evaluation, they permit the attribution of observed changes in outcomes to the program being evaluated by following experimental and quasi-experimental designs".<ref>[http://siteresources.worldbank.org/INTDEVIMPEVAINI/Resources/DIME_project_document-rev.pdf World Bank (n.d.) The Development IMpact Evaluation (DIME) Initiative, Project Document, World Bank, Washington, D.C.]</ref>
Line 103 ⟶ 113:
Other authors make a distinction between "impact evaluation" and "impact assessment." "Impact evaluation" uses empirical techniques to estimate the effects of interventions and their statistical significance, whereas "impact assessment" includes a broader set of methods, including structural simulations and other approaches that cannot test for statistical significance.<ref name=":0" />
Common definitions of 'impact' used in evaluation generally refer to the totality of longer-term consequences associated with an intervention on quality-of-life outcomes. For example, the Organization for Economic Cooperation and Development's Development Assistance Committee (OECD-DAC) defines impact as the "positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended".<ref>[http://www.oecd.org/dataoecd/8/43/40501129.pdf OECD-DAC (2002) Glossary of Key Terms in Evaluation and Results-Based Management Proposed Harmonized Terminology, OECD, Paris]</ref> A number of international agencies have also adopted this definition of impact. For example, UNICEF defines impact as "The longer term results of a program – technical, economic, socio-cultural, institutional, environmental or other – whether intended or unintended. The intended impact should correspond to the program goal."<ref>[http://www.unicef.org/evaldatabase/files/UNICEF_Eval_Report_Standards.pdf UNICEF (2004) UNICEF Evaluation Report Standards, Evaluation Office, UNICEF NYHQ, New York]</ref> Similarly, Evaluationwiki.org defines impact evaluation as an evaluation that looks beyond the immediate results of policies, instruction, or services to identify longer-term as well as unintended program effects.<ref>{{cite web|url=http://www.evaluationwiki.org/index.php/Evaluation_Definition:_What_is_Evaluation%3F#Impact_Evaluations|title=Evaluation Definition: What is Evaluation? - EvaluationWiki|
Technically, an evaluation could be conducted to assess 'impact' as defined here without reference to a counterfactual. However, much of the existing literature (e.g. NONIE Guidelines on Impact Evaluation<ref name="worldbank.org1">{{cite web|url=http://www.worldbank.org/ieg/nonie/guidance.html|title=Page Not Found|
What is missing from the term 'impact' evaluation is the way 'impact' shows up long-term. For instance, most Monitoring and Evaluation 'logical framework' plans have inputs-outputs-outcomes and... impacts. While the first three appear during the project duration itself, impact takes far longer to take place. For instance, in a 5-year agricultural project, seeds are inputs, farmers trained in using them our outputs, changes in crop yields as a result of the seeds being planted properly in an outcome and families being more sustainably food secure over time is an impact. Such [http://valuingvoices.com/sustained-impact-post-project-ex-post-little-proof-at-3ie/ post-project impact evaluations] are very rare. They are also called ex-post evaluations or we are coining the term [http://valuingvoices.com/what-happens-after-the-project-ends-lessons-from-post-project-sustained-impact-evaluations-part-1/ sustained impact evaluations]. While hundreds of thousands of documents call for them, rarely do donors have the funding flexibility - or interest - to return to see how sustained, and durable our interventions remained after project close out, after resources were withdrawn. There are many [http://valuingvoices.com/what-happens-after-the-project-ends-lessons-from-post-project-sustained-impact-evaluations-part-1/ lessons to be learned for design, implementation, M&E] and how to foster [http://valuingvoices.com/what-happens-after-the-project-ends-country-national-ownership-lessons-from-post-project-sustained-impact-evaluations-part-2/ country-ownership].
Line 111 ⟶ 121:
=== Methodological debates ===
There is intensive debate in academic circles around the appropriate methodologies for impact evaluation, between proponents of experimental methods on the one hand and proponents of more general methodologies on the other. William Easterly has referred to this as [http://aidwatchers.com/2009/12/the-civil-war-in-development-economics/ 'The Civil War in Development economics']. Proponents of experimental designs, sometimes referred to as 'randomistas',<ref name="auto"/> argue randomization is the only means to ensure unobservable selection bias is accounted for, and that building up the flimsy experimental evidence base should be developed as a matter of priority.<ref>{{cite web|url=http://www.mdgoals.net/wp-content/uploads/banerjee.pdf|title=Banerjee, A. V. (2007) 'Making Aid Work' Cambridge, Boston Review Book, MIT Press, MA|
=== Theory-based impact evaluation ===
While knowledge of effectiveness is vital, it is also important to understand the reasons for effectiveness and the circumstances under which results are likely to be replicated. In contrast with 'black box' impact evaluation approaches, which only report mean differences in outcomes between treatment and comparison groups, theory-based impact evaluation involves mapping out the causal chain from inputs to outcomes and impact and testing the underlying assumptions.<ref name="3ieimpact.org">
White (2009b)<ref name="3ieimpact.org"/> advocates more widespread application of a theory-based approach to impact evaluation as a means to improve policy relevance of impact evaluations, outlining six key principles of the theory-based approach:
Line 127 ⟶ 137:
== Examples ==
While experimental impact evaluation methodologies have been used to assess nutrition and water and sanitation interventions in developing countries since the 1980s, the first, and best known, application of experimental methods to a large-scale development program is the evaluation of the [[Conditional Cash Transfer]] (CCT) program Progresa (now called [[Oportunidades]]) in Mexico, which examined a range of development outcomes, including schooling, immunization rates and child work.<ref>[http://www.ifpri.org/sites/default/files/publications/gertler_health.pdf Gertler, P. (2000) Final Report: The Impact of PROGRESA on Health. International Food Policy Research Institute, Washington, D.C.]</ref><ref>{{cite web|url=http://athena.sas.upenn.edu/~petra/papers/trans18.pdf|title=Untitled Document|
More recently, impact evaluation has been applied to a range of interventions across social and productive sectors. 3ie has launched an online [http://www.3ieimpact.org/database_of_impact_evaluations.html database of impact evaluations] covering studies conducted in low- and middle income countries. Other organisations publishing Impact Evaluations include [http://poverty-action.org/work/publications Innovations for Poverty Action], the World Bank's [http://www.worldbank.org/dime DIME Initiative] and [http://www.worldbank.org/ieg/nonie/papers.html NONIE]. The [[Independent Evaluation Group|IEG]] of the World Bank has systematically assessed and summarized the experience of ten impact evaluation of development programs in various sectors carried out over the past 20 years.<ref>[http://lnweb18.worldbank.org/oed/oeddoclib.nsf/DocUNIDViewForJavaSearch/35BC420995BF58F8852571E00068C6BD/$file/impact_evaluation.pdf Impact Evaluation: The Experience of the Independent Evaluation Group of the World Bank, 2006]</ref>
Line 133 ⟶ 143:
== Organizations promoting impact evaluation of development interventions ==
In 2006, the Evaluation Gap Working Group<ref>{{cite web|url=
Another initiative devoted to the evaluation of impacts is the [https://web.archive.org/web/20101108212501/http://www.sustainablecommodities.org/
A number of additional organizations have been established to promote impact evaluation globally, including [http://poverty-action.org/ Innovations for Poverty Action], the [http://www.worldbank.org/en/programs/sief-trust-fund World Bank's Strategic Impact Evaluation Fund (SIEF)], the World Bank's Development Impact Evaluation (DIME) Initiative, the [https://web.archive.org/web/20100716204207/http://www.cgiar-ilac.org/ Institutional Learning and Change (ILAC) Initiative] of the CGIAR, and the [http://www.worldbank.org/ieg/nonie/ Network of Networks on Impact Evaluation (NONIE)].
== Systematic reviews of impact evidence ==
A range of organizations are working to coordinate the production of [[systematic reviews]]. Systematic reviews aim to bridge the research-policy divide by assessing the range of existing evidence on a particular topic, and presenting the information in an accessible format. Like rigorous impact evaluations, they are developed from a study Protocol which sets out a priori the criteria for study inclusion, search and methods of synthesis. Systematic reviews involve five key steps: determination of interventions, populations, outcomes and study designs to be included; searches to identify published and unpublished literature, and application of study inclusion criteria (relating to interventions, populations, outcomes and study design), as set out in study Protocol; coding of information from studies; presentation of quantitative estimates on intervention effectiveness using forest plots and, where interventions are determined as appropriately homogeneous, calculation of a pooled summary estimate using meta-analysis; finally, systematic reviews should be updated periodically as new evidence emerges. Systematic reviews may also involve the synthesis of qualitative information, for example relating to the barriers to, or facilitators of, intervention effectiveness.
== See also ==
Line 166 ⟶ 172:
* [http://www.3ieimpact.org International Initiative for Impact Evaluation]
* [http://poverty-action.org/ Innovations for Poverty Action]
*
▲* [http://www.sustainablecommodities.org/cosa Committee on Sustainability Assessment (COSA)]
* [http://www.iisd.org International Institute for Sustainable Development (IISD)]
* [http://www.intracen.org UN International Trade Centre (ITC)]
{{DEFAULTSORT:Impact Evaluation}}
[[Category:Impact assessment]]
Line 176 ⟶ 181:
[[Category:Educational evaluation methods]]
[[Category:Observational study]]
[[Category:Management cybernetics]]
|