Content deleted Content added
→Counterfactual evaluation designs: grammar: discusses -> discuss |
removed the |
||
(48 intermediate revisions by 34 users not shown) | |||
Line 1:
{{Short description|Assessment of particular interventions}}
'''Impact evaluation''' assesses the changes that can be attributed to a particular intervention, such as a project, program or policy, both the intended ones, as well as ideally the unintended ones.<ref>[http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/EXTPOVERTY/EXTISPMA/0,,menuPK:384336~pagePK:149018~piPK:149093~theSitePK:384329,00.html World Bank Poverty Group on Impact Evaluation], accessed on January 6, 2008</ref> In contrast to outcome monitoring, which examines whether targets have been achieved, impact evaluation is structured to answer the question: how would outcomes such as participants’ well-being have changed if the intervention had not been undertaken? This involves counterfactual analysis, that is, “a comparison between what actually happened and what would have happened in the absence of the intervention.”<ref>[http://lnweb90.worldbank.org/oed/oeddoclib.nsf/DocUNIDViewForJavaSearch/35BC420995BF58F8852571E00068C6BD/$file/impact_evaluation.pdf White, H. (2006) Impact Evaluation: The Experience of the Independent Evaluation Group of the World Bank, World Bank, Washington, D.C., p. 3]</ref> Impact evaluations seek to answer cause-and-effect questions. In other words, they look for the changes in outcome that are directly attributable to a program.<ref>[http://publications.worldbank.org/index.php?main_page=product_info&cPath=1&products_id=23915 Gertler, Martinez, Premand, Rawlings and Vermeersch (2011) Impact Evaluation in Practice, Washington, DC:The World Bank]</ref>▼
{{external links|date=June 2017}}
▲'''Impact evaluation''' assesses the changes that can be attributed to a particular intervention, such as a project, program or policy, both the intended ones, as well as ideally the unintended ones.<ref>[http://web.worldbank.org/WBSITE/EXTERNAL/TOPICS/EXTPOVERTY/EXTISPMA/0,,menuPK:384336~pagePK:149018~piPK:149093~theSitePK:384329,00.html World Bank Poverty Group on Impact Evaluation], accessed on January 6, 2008</ref> In contrast to outcome monitoring, which examines whether targets have been achieved, impact evaluation is structured to answer the question: how would outcomes such as
Impact evaluation helps people answer key questions for evidence-based policy making: what works, what
== Counterfactual evaluation designs ==
[[Counterfactual conditional|Counterfactual]] analysis enables evaluators to attribute cause and effect between interventions and outcomes. The
There are five key principles relating to internal validity (study design) and external validity (generalizability) which rigorous impact evaluations should address: confounding factors, [[selection bias]], spillover effects, contamination, and impact heterogeneity.<ref>
* '''Confounding''' occurs where certain factors, typically relating to
* '''Selection bias''', a special case of confounding, occurs where intervention participants are non-randomly drawn from the beneficiary population, and the criteria determining selection are correlated with outcomes. [[Unobserved heterogeneity|Unobserved factors]], which are associated with access to or participation in the intervention, and are causally related to the outcome of interest, may lead to a spurious relationship between intervention and outcome if unaccounted for. Self-selection occurs where, for example, more able or organized individuals or communities, who are more likely to have better outcomes of interest, are also more likely to participate in the intervention. Endogenous program selection occurs where individuals or communities are chosen to participate because they are seen to be more likely to benefit from the intervention. Ignoring confounding factors can lead to a problem of omitted variable bias. In the special case of selection bias, the endogeneity of the selection variables can cause simultaneity bias.
* '''Spillover''' (referred to as contagion in the case of experimental evaluations) occurs when members of the comparison (control) group are affected by the intervention.
Line 15 ⟶ 17:
* '''Impact heterogeneity''' refers to differences in impact due by beneficiary type and context. High quality impact evaluations will assess the extent to which different groups (e.g., the disadvantaged) benefit from an intervention as well as the potential effect of context on impact. The degree that results are generalizable will determine the applicability of lessons learned for interventions in other contexts.
Impact evaluation designs are identified by the type of methods used to generate the counterfactual and can be broadly classified into three categories – experimental, quasi-experimental and non-experimental designs – that vary in feasibility, cost, involvement during design or after implementation phase of the intervention, and degree of selection bias. White (2006)<ref name="worldbank.org">
==
{{further|Experimental design}}
Under experimental evaluations the treatment and comparison groups are selected randomly and isolated both from the intervention, as well as any interventions which may affect the outcome of interest. These evaluation designs are referred to as [[Randomized controlled trial|randomized control trials]] (RCTs). In experimental evaluations the comparison group is called a [[control group]]. When randomization is implemented over a sufficiently large sample with no contagion by the intervention, the only difference between treatment and control groups on average is that the latter does not receive the intervention. Random sample surveys, in which the sample for the evaluation is chosen randomly, should not be confused with experimental evaluation designs, which require the random assignment of the treatment.
The experimental approach is often held up as the
=== Randomised control trials (RCTs) ===
===Quasi-experimental design===▼
RCTs are studies used to measure the effectiveness of a new intervention. They are unlikely to prove causality on their own, however randomisation reduces bias while providing a tool for examining cause-effect relationships.<ref>{{Cite journal|last1=Hariton|first1=Eduardo|last2=Locascio|first2=Joseph J.|date=December 2018|title=Randomised controlled trials—the gold standard for effectiveness research|journal=BJOG: An International Journal of Obstetrics and Gynaecology|volume=125|issue=13|pages=1716|doi=10.1111/1471-0528.15199|issn=1470-0328|pmc=6235704|pmid=29916205}}</ref> RCTs rely on random assignment, meaning that that evaluation almost always has to be designed ''ex ante'', as it is rare that the natural assignment of a project would be on a random basis.<ref name=":1">{{cite journal|last=White|first=Howard|date=8 March 2013|title=An introduction to the use of randomised control trials to evaluate development interventions|journal=Journal of Development Effectiveness|volume=5|pages=30–49|doi=10.1080/19439342.2013.764652|s2cid=51812043|doi-access=free}}</ref> When designing an RCT, there are five key questions that need to be asked: What treatment is being tested, how many treatment arms will there be, what will be the unit of assignment, how large of a sample is needed, how will the test be randomised.<ref name=":1" /> A well conducted RCT will yield a credible estimate regarding the average treatment effect within one specific population or unit of assignment.<ref name=":2">{{Cite web|last1=Deaton|first1=Angus|last2=Cartwright|first2=Nancy|date=2016-11-09|title=The limitations of randomised controlled trials|url=https://voxeu.org/article/limitations-randomised-controlled-trials|access-date=2020-10-26|website=VoxEU.org}}</ref> A drawback of RCTs is 'the transportation problem', outlining that what works within one population does not necessarily work within another population, meaning that the average treatment effect is not applicable across differing units of assignment.<ref name=":2" />
=== Natural experiments ===
Natural experiments are used because these methods relax the inherent tension uncontrolled field and controlled laboratory data collection approaches.<ref name=":3">{{Cite journal|last1=Roe|first1=Brian E.|last2=Just|first2=David R.|date=December 2009|title=Internal and External Validity in Economics Research: Tradeoffs between Experiments, Field Experiments, Natural Experiments, and Field Data|url=http://dx.doi.org/10.1111/j.1467-8276.2009.01295.x|journal=American Journal of Agricultural Economics|volume=91|issue=5|pages=1266–1271|doi=10.1111/j.1467-8276.2009.01295.x|issn=0002-9092}}</ref> Natural experiments leverage events outside the researchers' and subjects' control to address several threats to internal validity, minimising the chance of confounding elements, while sacrificing a few of the features of field data, such as more natural ranges of treatment effects and the presence of organically formed context.<ref name=":3" /> A main problem with natural experiments is the issue of replicability. Laboratory work, when properly described and repeated, should be able to produce similar results. Due to the uniqueness of natural experiments, replication is often limited to analysis of alternate data from a similar event.<ref name=":3" />
== Non-experimental approaches ==
▲=== Quasi-experimental design ===
[[Quasi-experiment]]al approaches can remove bias arising from selection on observables and, where panel data are available, time invariant unobservables. Quasi-experimental methods include matching, differencing, instrumental variables and the pipeline approach; they are usually carried out by multivariate [[regression analysis]].
Line 29 ⟶ 41:
If selection characteristics are known and observed, they can be controlled for to remove the bias. Matching involves comparing program participants with non-participants based on observed selection characteristics. [[Propensity score matching]] (PSM) uses a statistical model to calculate the probability of participating on the basis of a set of observable characteristics and matches participants and non-participants with similar probability scores. [[Regression discontinuity design]] exploits a decision rule as to who does and does not get the intervention to compare outcomes for those just either side of this cut-off.
[[Difference
[[Instrumental Variable|Instrumental variables]] estimation accounts for selection bias by modelling participation using factors (
The pipeline approach ([[
=== Non-experimental design ===
Non-experimental impact evaluations are so-called because they do not involve a comparison group that does not have access to the intervention. The method used in non-experimental evaluation is to compare intervention groups before and after implementation of the intervention. Intervention [[interrupted time-series]] (ITS) evaluations require multiple data points on treated individuals before and after the intervention, while before versus after (or pre-test post-test) designs simply require a single data point before and after. Post-test analyses include data after the intervention from the intervention group only. Non-experimental designs are the weakest evaluation design, because to show a causal relationship between intervention and outcomes convincingly, the evaluation must demonstrate that any likely alternate explanations for the outcomes are irrelevant. However, there remain applications to which this design is relevant, for example, in calculating time-savings from an intervention which improves access to amenities. In addition, there may be cases where non-experimental designs are the only feasible impact evaluation design, such as universally implemented programmes or national policy reforms in which no isolated comparison groups are likely to exist.
== Biases in estimating programme effects ==
Randomized field experiments are the strongest research designs for assessing program impact. This particular research design is said to generally be the design of choice when it is feasible as it allows for a fair and accurate estimate of the
With that said, randomized field experiments are not always feasible to carry out and in these situations there are alternative research designs that are at the disposal of an evaluator. The main problem though is that regardless of which design an evaluator chooses, they are prone to a common problem: Regardless of how well thought through or well implemented the design is, each design is subject to yielding biased estimates of the program effects. These biases play the role of exaggerating or diminishing program effects. Not only that, but the direction the bias may take cannot usually be known in advance (Rossi et al., 2004). These biases affect the interest of the stakeholder. Furthermore, it is possible that program participants are disadvantaged if the bias is in such a way that it contributes to making an ineffective or harmful program seem effective. There is also the possibility that a bias can make an effective program seem ineffective or even as far as harmful. This could possibly make the accomplishments of program seem small or even insignificant therefore forcing the personnel and even cause the
It is safe to say that if an inadequate design yields bias, the stakeholders who are largely responsible for the funding of the program will be the ones most concerned; the results of the evaluation help the stakeholders decide whether or not to continue funding the program because the final decision lies with the funders and the sponsors. Not only are the stakeholders mostly concerned, but those taking part in the program or those the program is intended to positively affect will be affected by the design chosen and the outcome rendered by that chosen design. Therefore, the
Biases are normally visible in two situations: when the measurement of the outcome with program exposure or the estimate of what the outcome would have been without the program exposure is higher or lower than the corresponding
The most common form of impact
===Selection bias===
When there is an absence of the assumption of equivalence, the difference in outcome between the groups that would have occurred regardless creates a form of bias in the estimate of program effects. This is known as selection bias (Rossi et al., 2004). It creates a threat to the validity of the program effect estimate in any impact assessment using a non-equivalent group comparison design and appears in situations where some process responsible for influences that are not fully known selects which individuals will be in which group instead of the assignment to groups being determined by pure chance (Rossi et al., 2004). This may be because of participant self-selection, or it may be because of program placement (placement bias).<ref name=":0">{{Cite book|url=https://www.adb.org/sites/default/files/publication/392376/impact-evaluation-development-interventions-guide.pdf|title=Impact Evaluation of Development Interventions: A Practical Guide|last1=White|first1=Howard|last2=Raitzer|first2=David|publisher=Asian Development Bank|year=2017|isbn=978-92-9261-059-3|location=Manila}}</ref>
Selection bias can occur through natural or deliberate processes that cause a loss of outcome data for members of the intervention and control groups that have already been formed. This is known as attrition and it can come about in two ways (Rossi et al., 2004): targets drop out of the intervention or control group cannot be reached or targets refuse to co-operate in outcome measurement. Differential attrition is assumed when attrition occurs as a result of something either than explicit chance process (Rossi et al., 2004). This means that
===Other forms of bias===
Line 63 ⟶ 75:
====Secular trends or secular drift====
Secular trends can be defined as being relatively long-term trends in the community, region or country. These are also termed secular drift and may produce changes that enhance or mask the apparent effects of
====Interfering events====
Line 73 ⟶ 85:
Impact evaluation needs to accommodate the fact that natural maturational and developmental processes can produce considerable change independently of the program. Including these changes in the estimates of program effects would result in bias estimates. An example of this form of bias would be a program to improve preventative health practices among adults may seem ineffective because health generally declines with age (Rossi et al., 2004, p273).
== Estimation methods ==
Estimation methods broadly follow evaluation designs. Different designs require different estimation methods to measure changes in well-being from the counterfactual. In experimental and quasi-experimental evaluation, the estimated impact of the intervention is calculated as the difference in mean outcomes between the treatment group (those receiving the intervention) and the control or comparison group (those who
Impact Evaluations which have to compare average outcomes in the treatment group, irrespective of beneficiary participation (also referred to as
== Debates
While there is agreement on the importance of
=== Definitions
The International Initiative for Impact Evaluation (3ie) defines rigorous
According to the World
Similarly, according to the US [[United States Environmental Protection Agency|Environmental Protection Agency]] impact evaluation is a form of evaluation that assesses the net effect of a program by comparing program outcomes with an estimate of what would have happened in the absence of a program.<ref>[http://www.epa.gov/evaluate/glossary/i-esd.htm US Environmental Protection Agency Program Evaluation Glossary], accessed on January 6, 2008</ref>
Line 95 ⟶ 107:
According to the World Bank's [[Independent Evaluation Group]] (IEG), impact evaluation is the systematic identification of the effects positive or negative, intended or not on individual households, institutions, and the environment caused by a given development activity such as a program or project.<ref>[http://www.worldbank.org/ieg/ie/ World Bank Independent Evaluation Group], accessed on January 6, 2008</ref>
Impact
*An evaluation which looks at the impact of an intervention on final welfare outcomes, rather than only at project outputs, or a process evaluation which focuses on implementation;
*An evaluation carried out some time (five to ten years) after the intervention has been completed so as to allow time for impact to appear; and
*An evaluation considering all interventions within a given sector or geographical area.
Other authors make a distinction between "impact evaluation" and "impact assessment." "Impact evaluation" uses empirical techniques to estimate the effects of interventions and their statistical significance, whereas "impact assessment" includes a broader set of methods, including structural simulations and other approaches that cannot test for statistical significance.<ref name=":0" />
Common definitions of
Technically, an evaluation could be conducted to assess
What is missing from the term 'impact' evaluation is the way 'impact' shows up long-term. For instance, most Monitoring and Evaluation 'logical framework' plans have inputs-outputs-outcomes and... impacts. While the first three appear during the project duration itself, impact takes far longer to take place. For instance, in a 5-year agricultural project, seeds are inputs, farmers trained in using them our outputs, changes in crop yields as a result of the seeds being planted properly in an outcome and families being more sustainably food secure over time is an impact. Such [http://valuingvoices.com/sustained-impact-post-project-ex-post-little-proof-at-3ie/ post-project impact evaluations] are very rare. They are also called ex-post evaluations or we are coining the term [http://valuingvoices.com/what-happens-after-the-project-ends-lessons-from-post-project-sustained-impact-evaluations-part-1/ sustained impact evaluations]. While hundreds of thousands of documents call for them, rarely do donors have the funding flexibility - or interest - to return to see how sustained, and durable our interventions remained after project close out, after resources were withdrawn. There are many [http://valuingvoices.com/what-happens-after-the-project-ends-lessons-from-post-project-sustained-impact-evaluations-part-1/ lessons to be learned for design, implementation, M&E] and how to foster [http://valuingvoices.com/what-happens-after-the-project-ends-country-national-ownership-lessons-from-post-project-sustained-impact-evaluations-part-2/ country-ownership].
Line 108 ⟶ 121:
=== Methodological debates ===
There is intensive debate in academic circles around the appropriate methodologies for
=== Theory-
While knowledge of effectiveness is vital, it is also important to understand the reasons for effectiveness and the circumstances under which results are likely to be replicated. In contrast with
White (2009b)<ref
== Examples
While experimental
More recently,
== Organizations promoting
In 2006, the Evaluation Gap Working Group<ref>
Another initiative devoted to the evaluation of impacts is the [https://web.archive.org/web/20101108212501/http://www.sustainablecommodities.org/
A number of additional organizations have been established to promote impact evaluation globally, including [http://poverty-action.org/ Innovations for Poverty Action], the [http://www.worldbank.org/en/programs/sief-trust-fund World Bank's Strategic Impact Evaluation Fund (SIEF)], the
== Systematic reviews of
A range of organizations are working to coordinate the production of [[systematic reviews]]. Systematic reviews aim to bridge the research-policy divide by assessing the range of existing evidence on a particular topic, and presenting the information in an accessible format. Like rigorous
== See also ==▼
* [[Econometrics]]▼
* [[Impact assessment]]▼
* [[Outcomes theory]]▼
* [[Policy analysis]]▼
* [[Policy studies]]▼
* [[Program evaluation]]▼
== References ==▼
<references/>▼
== Sources and external links ==
Line 151 ⟶ 172:
* [http://www.3ieimpact.org International Initiative for Impact Evaluation]
* [http://poverty-action.org/ Innovations for Poverty Action]
*
▲* [http://www.sustainablecommodities.org/cosa Committee on Sustainability Assessment (COSA)]
* [http://www.iisd.org International Institute for Sustainable Development (IISD)]
* [http://www.intracen.org UN International Trade Centre (ITC)]
▲== See also ==
▲* [[Econometrics]]
▲* [[Impact assessment]]
▲* [[Outcomes theory]]
▲* [[Participatory Impact Pathways Analysis]]
▲* [[Policy analysis]]
▲* [[Policy studies]]
▲* [[Program evaluation]]
▲== References ==
▲<references/>
{{DEFAULTSORT:Impact Evaluation}}
Line 174 ⟶ 181:
[[Category:Educational evaluation methods]]
[[Category:Observational study]]
[[Category:Management cybernetics]]
|