Evaluation Policy

The Bill & Melinda Gates Foundation collaborates with partners to promote long-term solutions that help people live healthy, productive lives. Achieving our ambitious goals requires rigorous evaluation so we and our partners can continually improve how we carry out our work.

Evaluation is the systematic, objective assessment of an ongoing or completed intervention, project, policy, program, or partnership. Evaluation is best used to answer questions about what actions work best to achieve outcomes, how and why they are or are not achieved, what the unintended consequences have been, and what needs to be adjusted to improve execution. When done well, evaluation is a powerful tool to inform foundation and partner decision making about how to optimize scarce resources for maximum impact. It is distinct from other forms of measurement that focus only on observing whether change has occurred, not why or how that change occurred.

Our current practice in evaluation is characterized by variation, and in the absence of a policy, decision making is left to individual program teams and program officers. Because the foundation supports a diverse range of partners and projects, it is necessary to have a clear organizational understanding of how evaluation should vary to best inform decision-making across each of these areas.

Purpose of the Policy

Our evaluation policy is intended to help foundation staff and our partners align their expectations in determining why, when, and how to use evaluation. More specifically, the policy encourages foundation teams to be more transparent, strategic, and systematic in deciding what and how to evaluate. Our aim is to integrate evaluation into the fabric of our work, achieve early alignment with our partners about what we are evaluating and why, and generate evidence that is useful to us and our partners as we move forward.

Organizational Context

Our evaluation policy is rooted in our business model, which involves working with partners to achieve the greatest impact. Early in the grant proposal process, we work with prospective partners to define and agree on measurable outcomes and indicators of progress and success. This enables our partners to learn as they carry out their work rather than be distracted by requirements to measure and report at every step along the way.

This approach reinforces the role of evaluation in testing innovation, making improvements, and understanding what works and why to learn quickly from failure and replicate success.

The policy is also rooted in the foundation’s core values: collaboration, rigor, innovation, and optimism. More specifically:

Our Strategies and Evaluation

The foundation organizes its resources by strategies, each in a specific area or sector. Each strategy has its own goals and priorities, partners and grantees, and allocation of foundation resources. Strategy teams execute their strategies by making investments (grants, contracts, and program-related investments), as well as through advocacy work.

Foundation teams measure the progress of their strategies and investigate what works best to achieve priority outcomes using many different types of evidence. A combination of evaluation findings, partner monitoring data, grantee reports, modeling, population-level statistics and other secondary data offer a more cost-effective and accurate alternative to large summative evaluations. We use all of these sources, including evaluation where relevant, expert opinion, and judgment to decide how to refine foundation strategies on a regular basis.

Evaluation is particularly warranted in the following instances:

Evaluation is a high priority when program outcomes are difficult to observe and knowledge is lacking about how best to achieve results—such as when we collaborate with partners who are working to improve service delivery or effect behavioral change, identify, replicate, or scale innovative models, or catalyze change in systems, policies, or institutions.

Evaluation is a low priority when the results of our efforts are easily observable. It is also a low priority when our partners are conducting basic scientific research, developing but not distributing products or tools, or creating new data sets or analyses. In such cases, our partners’ self-reported progress data and existing protocols (such as for clinical trials) provide sufficient feedback for decision making and improvement.

Program teams are not expected to use evaluation to sum up the results of foundation strategies. This would not be the best use of scarce measurement and evaluation resources for two reasons: 1) the impact of our investments cannot easily be differentiated from that of our partners’ investments and efforts; and 2) foundation leaders are more interested in learning how our teams can make the best use of resources and partnerships and how to strengthen program execution.

Evaluation Design and Methods

Evaluation is a contested discipline. We are aware of the ongoing and healthy debate about what types of evidence are appropriate to inform policy and practice in U.S. education and in international public health and development. However, the diversity of our partners and areas of focus precludes us from promoting only certain types of evaluation evidence as acceptable for decision making.

We avoid a one-size-fits-all approach to evaluation because we want our evaluation efforts to be designed for a specific purpose and for specific intended users. This approach to evaluation design, which we call fit to purpose, has three elements:

The following three designs represent the vast majority of the evaluations we support.

Evaluations to understand and strengthen program effectiveness

Evaluations that help our partners strengthen the execution of projects are among the most relevant for the foundation because they provide feedback about what is and isn’t working within a specific location or across locations.

We use this type of evaluation in the following scenarios:

Such evaluations should be designed with the following considerations in mind:

Evaluations may include impact estimates if those are needed to inform important decisions—about scaling up an initiative, for example, or about the level of penetration needed to ensure a certain level of impact. Impact estimates should not be used as proof of macro-level impact, however.

Because the assumptions used to construct impact estimates can lead to large error margins, a robust baseline of key coverage indicators is essential, along with data on how these indicators have changed over time. Population-level impact can then usually be determined through modeling or use of secondary data.

In select cases, it may be necessary to determine a causal relationship between the change in coverage and the desired population-level impact. If so, the design should include a plausible counterfactual, usually obtained through modeling or comparison with national or sub-national trends.

Evaluations to test the causal effects of pilot projects, innovations, or delivery models

Evaluations that produce causal evidence can be used to decide whether to scale up or replicate pilots, innovations, or delivery models. They can also provide essential knowledge to the foundation, our partners, policymakers, and practitioners.

We use this type of evaluation in the following scenarios:

Evaluations of causal relationships should be designed with the following considerations in mind:

Evaluations of causal relationships should not be used when existing proxies of effectiveness and outcomes are sufficient. They are also not appropriate for evaluating whole packages of interventions with multiple cause-and-effect pathways.

Evaluations to improve the performance of institutions or operating models

Evaluations that provide a neutral assessment of the effectiveness of an organization or operating model can inform foundation and partner decision making about how best to use financial or technical resources, resolve challenges, and support ongoing progress.

We use this type of evaluation selectively, in the following scenarios:

Evaluations of institutional effectiveness and operating models should be designed with the following considerations:

Such evaluations are largely qualitative and should not seek to assess the causal relationship between a partner organization or operating model and program outcomes.

Evaluation Roles and Responsibilities

Our evaluation policy is a starting point for strengthening how we use evaluation within the foundation and with our partners. We complement it with resources and designated roles within the foundation that enable clear decision making about when and how to use evaluation and facilitate consistent management of evaluations and use of findings. These resources and roles are detailed in the following sections.

Evaluation plans

Program teams in our U.S. and global divisions that work with partners each have an evaluation plan, which they share openly with partners to promote collaboration, joint evaluation, and learning within and outside the foundation. The plan identifies existing evidence and the critical gaps that we and our partners need to fill to inform decision making and build knowledge.

Program officers consult the team plan before making decisions about specific evaluations, to ensure that evaluation investments fit into an overall strategic framework. They also consult with the foundation’s central Strategy, Measurement & Evaluation team, which works with all program teams at the foundation to find opportunities to invest in and share evaluations that have cross-program relevance and to advance innovation in evaluation methods.

During the grant development process, our program officers and partners discuss and decide whether an evaluation will be needed, to ensure alignment on expectations and sufficient resources to produce useful evaluations. Key factors include the following:

All foundation-funded evaluations—whether conducted by independent parties or integrated into our partners’ work—are recorded in a foundation evaluation registry. This helps us track evaluation spending and findings, and ensure continuity and consistency regardless of any foundation or partner staff turnover.

Roles and responsibilities

Responsibility for evaluation takes place at many levels of the foundation:

Whenever possible, foundation teams look for opportunities to build on grantee monitoring and evaluation rather than create parallel systems, and to invest in national evaluation capacity to support our global programs. We recognize that this may entail concurrent investments in building our partners’ evaluation capabilities. This support is provided directly by program teams and their embedded measurement, learning, and evaluation staff.

Conclusion

The Bill & Melinda Gates Foundation made a clear commitment to actionable measurement as a guiding philosophy in 2008. The philosophy spells out a clear need for purpose-driven evaluation rather than adherence to any one particular method or design. Where relevant and matched to the type of work we do, evaluation can help depersonalize decision making and provide objective data that can inform action.

This policy document outlines the foundation’s position on why, when, and how we use evaluation to create useful evidence for decision makers, improve program execution, inform our evidence-based advocacy, and develop stronger relationships with our grantees and partners. We view it as the starting point of a larger effort to make high-quality evaluation an integral part of how we operate and carry out our work.