Framework for Action - Participants -Organizers
Online coverage - NGO Consultation
Latest news - Follow-up to the Dakar Forum
The opinion of EFA partners - Grassroots stories
The EFA 2000 Assessment - The findings
The regional meetings - Evaluation
Press releases - Press kit
Photo corner - Media contacts
The findings > Thematic Studies> Funding Agency ...> Part 4/Section D/2
  Country EFA reports
  Regional Frameworks for Action
 
 
11. Monitoring and Evaluation
 
"Whether or not expanded educational opportunities will translate into meaningful development - for an individual or for society - depends ultimately on whether people actually learn as a result of those opportunities, i.e. whether they incorporate useful knowledge, reasoning/ability, skill and values. The focus of basic education must therefore be on actual learning acquisition and outcome, rather than exclusively upon enrolment, combined participation in organised programmes and completion of certification requirements. Active and participatory approaches are particularly valuable in assuring learner acquisition and allowing learners to reach their full potential. It is therefore necessary to define acceptable levels of learning acquisition for educational programmes and to improve and apply systems of assessing learning achievement" (Declaration: 4).
 
11.1 The Jomtien Agenda
 

The emphasis on 'actual learning and acquisition' was a substantial shift from the earlier World Conferences (e.g. Addis Abbaba and Cairo) where the emphasis had been on attainment of universal primary schooling for all those of school going age. The shift from measuring enrolment, retention and completion rates to measuring learning outcomes had been presaged in the North in the OECD's work on Social Indicators (OECD, 1974) and Educational Indicators (OECD 1976, 1972); and, of course, in the work of the International Association for the Evaluation of Educational Achievement (IEA) from 1973. The use of educational indicators fell out of favour during the late 1970s and early 1980s.However, the rise of 'managerialism' in government, that is to say management by objectives or by results, the publication in the US of its 'A Nation at Risk' document and a perceived need to be competitive in a global market, led to renewed interest in educational performance indicators in the late 1980s and early 1990s.

However, it has to be emphasised that for many communities and parents in the South attendance at school is itself an outcome. In this sense, the focus on outcomes is an imported, some would say imposed concern; indeed it is noticeable that one of the two interventions cited in the section on Focus on Effective Learning is from the Commonwealth Secretariat cautioning against a view too strongly focused on achievement criteria; and whilst the other - from the New York HQ of UNICEF - promoted minimum levels of learning, later documents from UNICEF have emphasised the importance of the pedagogical process rather than necessarily the outcomes (see below).

Nevertheless, the implication of Article 4 of the Declaration clearly was that governments should introduce systematic monitoring of children's achievement based on some to-be-specified criteria of learning within an overall Education Management Information System (EMIS) and to assess the effectiveness of delivery mechanisms (Framework:57). The reality, of course, in many countries, is that the EMIS was and remains weak.

Several agency sponsored programmes to improve partner country capacity have, in fact, been introduced during the 1980s and 1990s to strengthen educational administration and, in particular, the collection of accurate and reliable routine data about enrolment, drop-out and retention. These are considered in the next section. During the 1980s, these were seen as part of general support to the sector, but within context of a SWAp, they become of central importance. The bottom line now therefore, is the extent to which routine data systems (the system level indicators) are, in fact, seen as sufficiently reliable by agencies to use as a basis for monitoring the effectiveness, if not the impact, of their own interventions.

In fact, however successful these programmes, they do not appear to have had sufficient impact upon partner country monitoring capacity for agencies to rely on the routine statistics compiled by the Ministries of Education: instead, typically, agency supported programmes of aid to primary schooling include the requirement for a one-off baseline survey. The next section is therefore rather brief. Instead we concentrate on how agencies have themselves approached monitoring and evaluation. The emphasis on outcomes and the shift towards co-operating with other agencies within a Sector Wide Approach (of whatever kind) raises expectations of change both within the nature of the agencies' own mechanisms for monitoring and evaluation (for accountability purposes), and the kind of monitoring and evaluations carried out of projects and programmes supported by them in developing countries. The organisation of evaluation within the agencies, whether on a central or a country basis, is considered in the third section. The fourth looks at the shift in the intended methodology which should be used in recipient countries for evaluation (including agency views on the appropriateness of different kinds of evaluation, and what kind of procedures are used now for monitoring and evaluation and how do these differ from what they did previously). Section 11.5 deals with what tends to be measured (do they assess quality - how defined and how - or concentrate only on outputs or even simply inputs). The final Section (11.6) reflects on whether and how approaches have changed, the development of in-country capacity, and the extent of improvements in quality.

 
11.2 Agency Programmes to Improve Educational Administration and Data Collection for Monitoring and Evaluation
 

Several programmes have been introduced/piloted, including COFORPA and by Sida and UNESCO. According to the reports of these programmes, the bulk of the activity involves training senior personnel in the techniques of data collection for monitoring and evaluation outside their work situation; very few have been focused on the practical problems of data collection and recording for a teacher without paper or pencil. There are exceptions, of course: Operation Blackboard in some of the Indian States included recording of attendance's by pupils themselves, but it is difficult to see how this could be systematised.

Clearly, however, efforts to strengthen educational administration and data collection through training small groups of senior teachers or inspectors for instance are only useful if they are complemented by across-the-board improvements in the administration of schools in the field and in the quality of the data collected about the functioning of the system.

The basic problem therefore is to design and implement protocols for monitoring which can be implemented across a widely dispersed system. There are no easy answers. Experience suggests that the protocols have to be based on the everyday reality of the people who are being relied upon to collect the data and that local personnel (teachers and head-teachers) have to be convinced of the value of such data collection (Carron and Ngoc Chau, 1995). Much of the explanation for the failure to build capacity in government or specifically relating to statistics lies in their focus on treating symptoms at the individual departmental level, rather than tackling the causes at the macro level. For instance, training for statisticians will not improve capacity in the long run, unless incentives are geared towards the retention of trained staff.

 
11.3 Evaluation within the Agencies
 
There is an important preliminary issue here of agency accountability before discussing the status of evaluation within the agency.
 
11.3.1 Agency Accountability
 

There have been external (and sometimes internal) pressures on aid-giving countries to increase the overall volume of aid and particularly the overall volume of aid to education; and also the proportion of that aid to basic human needs (including basic education). The statistical trends were presented in Chapter 5 and targeting was discussed in Chapter 6. Many governments have made political commitments to a particular target, whether in terms of overall volumes (e.g. the UN target of 0.7% of GDP) or distribution (e.g. the 20/20 criterion of Copenhagen).

The extent to which these political promises are met, is, in principle, reasonably easy to monitor and, of course, UNICEF, along with many NGOs, regularly publicise the relatively low proportion of aid targeted towards basic human needs or which meet the 20/20 criterion of Copenhagen. These questions have been dealt with in Chapter 6 earlier and will not be considered further here.

However, whether or not those satisfy their own electorate is a different matter. Indeed the question of how agencies have set about satisfying their home country, taxpaying community in terms of value for money is central here. Some countries have carried out surveys (e.g. Germany, Sweden), although the reliability of answers to such general questions is rather doubtful. One part of the answer for agencies involves explaining through a form of development education the complexities of demonstrating a measurable impact on development to their own taxpayers. It is, clearly, a live issue. 11.3.2 Status of Evaluation in the Agency Several agencies now have a Central Evaluation and or Research Unit (see Table 11.1) which is mandated to carry out (or organise) a number of evaluations per year of projects which imply significant or substantial agency commitment, and sometimes thematic reviews, e.g. Norway, Sida/EIAD, UNICEF/EPP.

 
Table 11.1: Programme Evaluation is Responsibility of:
 
Central (Metropolitan) Evaluation Unit ADB, Australia, Austria, Canada, Denmark, Ireland, Japan, New Zealand, Norway, Switzerland, UK,USA, WB
Education Division in Central Office Sida Germany (GTZ) IDB
Country Office Germany (GTZ) UNICEF, UNDP
EPP/UNICEF Division of Education Policy and Planning, UNICEF ED/UK DFID Evaluation Department OED/WB Operations Evaluation Department OEU/NORAD Operations and Evaluation Unit
 
However, there are only a relatively small number of such evaluations on the education sector, let alone the sub-sector of basic education. In part, this simply reflects the relative size of the aid-to-basic-education budget relative to the overall budget and, despite the apparent importance attached to evaluation in many statements, the limited budget allowed for evaluations by bilaterals. For example, the Netherlands central unit conducts about 10 evaluations a year; and the UK between 10 and 20; and these are the numbers of evaluations for all sectors. The multilaterals appear to take this more seriously with the UNICEF HQ carrying out about 25 a year; and field offices allowing 3%-5% of the country budget for evaluations and the World Bank says that 50%-60% of its budget is evaluated which sounds rather more than the bilaterals. This suggests that there are opportunities to realise economies of scale.
 
11.4 Agency Approaches to Monitoring and Evaluation
 
11.4.1 Definition of Terms
 

"An evaluation is an assessment, as systematic and objective as possible, of on-going or completed aid activities, their design, implementation and results. The aim is to determine the relevance and fulfilment of objectives, developmental efficiency, effectiveness, impact and sustainability". (Source: OECD DAC recommended definition adopted by all major agencies)

Terms are used differently in individual countries by various agencies and in the extent to which they are made public. For Denmark and Norway reviews refer to periodic assessment which remain internal documents. Sweden includes 'reviews' 'sector studies' and 'evaluation' under the generic term evaluation, whilst most other countries draw a sharp distinction between evaluation and review of sector analyses or studies. The Swiss focus on the distinction between external evaluation and self evaluation.

"[Outside] Evaluators cannot be aware of the whole truth but their opinion should encourage a dialogue between participants should make complex situations clearer, facilitate decision-taking, and provide pointers to new solutions" External Evaluation in Development Co-operation. (SCD 1991)

11.4.2 Developing the Art of Evaluation
 
The story of DANIDA's evaluation unit (Box 11.1) is probably typical of changes in many agencies over the last ten years.
 
Box 11.1
 

DANIDA Evaluation Unit established 1982 renamed Evaluation Secretariat in 1997. Prior to 1982, most evaluations were mid-term or phase evaluations of projects. During 1982-7. Whilst this remained the case for about 5 years, the use of evaluations became more systematic in that the choice to evaluate was guided by an annual programme to ensure that altogether projects or programmes were representative of Danish bilateral aid. During 1982-97, the number of individual project evaluations was reduced and the number of thematic or sectoral evaluations increased.

The trend in DANIDA to shift the focus from the evaluation of project aid towards more complex modes of development assistance [which] have proved to be increasingly cost effective" and with more impact on policy decisions.

Source: Danida Evaluation Guidelines

 

The evaluation of a project follows how inputs are transformed (or not) into outputs and then asks whether or not the original objectives have been achieved and what could be the impact in principle, assuming there are no constraints etc. The process can be described as a single cause-effect chain of identifiable elements which makes it easier to focus the evaluation. The context is, probably, limited to one geographical area and therefore one set of socio-demographic circumstances and other sectoral inputs and the education system provide the backcloth for the intervention. Sometimes, project evaluations do attempt to evaluate the achievement of objectives, but because other factors intervene and the project itself has a range of effects, perhaps involving other sectors as well as areas outside the project focus. This is usually too ambitious.

The evaluation of a sector programme has a broader focus. There is not a single set of consistent objectives, and there are often several agencies involved. Moreover the context is far more complicated including not only the education system but other social sectors. Often the focus is on institutional development rather than a set of learning outcomes (although see section 11.5 below). Organising an evaluation in this situation is difficult

: "the shift towards strengthening primary education is a notable and welcome development but that there are significant difficulties attached to the SWAps. In particular evaluation is not sufficiently integrated into the design of projects" (Colclough, Bennell and Al-Samurrai 1999).

11.4.3 More Sophisticated?
 
Perhaps because of the selective choice of material sent to us, there is a clear impression of a more professional and systematic approach to evaluation than there was 10 years ago. Then, evaluations appear to have been ad hoc, now they appear to be pre-planned and designed at the inception of the project or programme and there are clear distinctions between different types of evaluation (for an example, again from Denmark, see Table 11.2). There is no doubt that agency sophistication in terms of the methodology has improved over the last ten years. Whether or not agency practice has improved over the same period 10 years is less clear.
 
Table 11.2 Types of Evaluations
 
Types of evaluations
 
Project/Component : Evaluation of individually planned undertaking designed to achieve specific objectives within a given budget and time period.
Sector: Usually applied cross-country wise.
Sector Programme Support: Evaluation of a coherent set of activities in terms of policies, institutions or finances.
Country: Evaluation of the combined cross-sectoral Danish support to a partner country, notably on of Danida's programme countries.
 
Thematic: Evaluation of selected aspects of different types of development aid (poverty, environment, choice of technology, gender aspects, sustainability, etc.)
Joint: Evaluation of large parts of development activities where Danish aid represent only a share of the total. Conducted jointly with other agencies and the partner.
Source: Danida Evaluation Guidelines
 

A number of evaluations from several agencies have been examined. The majority of the evaluations that we have received have therefore been conducted (perhaps according to the same criteria) by teams constructed for the purpose. This can have several advantages (for example, the team knows about the subject).

They vary enormously from in-house reviews of progress (or not) in a sector to evaluations of a particular project to pure research projects. It is therefore difficult to discern any overall trend, although one could cite the increasing popularity of base-line surveys. In principle, these are to be conducted at the inception of a project or programme; the practice is less consistent.

The increasingly wide scope of aid support was seen by most agencies as necessitating a shift from increasing specific project outputs and/or inputs to system level indicators (as below). Where in-country routine systems are weak, this has involved agencies in organising special "baseline" surveys to collect the data they might (reasonably) have thought to be already available. Inevitably, these broader surveys have led to a loss of attention to detail. Hence the call for both follow-up surveys to be supplemented by special studies of specific aspects of the system (or of specific aspects of agency interventions).

However, some agencies, e.g. UNICEF, have moved away from book and classroom towards learning environment and the child and the psychosocial tools estimations of outcomes and achievement not adequate for capturing the learning environment of the child.

 
11.5 What Tends to be Measured:
 

Given the emphasis in Jomtien on outcomes, and the agreement to five International Development Targets, it is perhaps not surprising that there has been an increasing emphasis on Indicators for Measuring Results and System Level Indicators.

Nowadays, most evaluations have to pay at least lip-service to some form of logical framework, or equivalent management tool, prepared at the beginning of the project or programme (e.g. UNICEF), with specific indicators taken as being good proxy measures for the attainment of one objective rather than another. An example, given in Table 11.3 taken from a USAID document, shows clearly how different indicators are seen as relevant to the attainment of different specific objectives.

The agencies most concerned with 'performance indicators' are, unsurprisingly, the World Bank and USAID with EU not far behind. The World Bank rates projects on five areas (see Box 11.2A); USAID proposes very broad-brush indicators (Box 11.2B); and the EC also appears to prefer the straightforward indicators (see Box 11.2C). There is apparently no awareness of the difficulty of collecting reliable data for these indicators.

 
Table 11.3 (not available)
Box 11.2

A) World Bank

Evaluation carried out by Operations Evaluation Department. All projects rated according to common criteria. three results oriented criteria - outcomes, sustainability, institutional development; two process oriented; Bank performance, borrower performance.

(B) USAID

Education's share of national budget

Primary education's share of education budget (for recurrent and capital expenditure); and

Share of primary recurrent, non-salary expenditure of primary budget.

As an indicator of effective schools, the use of a fundamental quality and equity level (FQEL) index, which measures the number of schools meeting minimum criteria in services and coverage ... a means of capturing the united elements that go into making an effective school and the idea of "access-with quality" (USAID 1998 : 41).

(C) EC/EU

Education indicators are well known. some of them, like those now being chosen to support structural adjustment in Burkina Faso, may be used in the context of the new conditionally approach: school-attendance rates (boys/girls); first-year primary attendance rates (boys/girls), success rates in end-of-primary exams (boys/girls), number of books per pupil; level of satisfaction among users; cost and rate of use of informal education by adults. A gender breakdown of indicators is essential here.

 
11.6 Changes Over Last 10 Years
 

This final section addresses the issue of how much the agencies have actually changed their practice over the last ten years.

how much do they co-operate with other agencies so that there are not repeated evaluations of the same programme?

do they involve in-country partners and how?

do they pay any attention to the results?

 
11.6.1 Co-operating with Other Agencies and with the Partner Country
 

There are already many difficulties in monitoring their own aid practice which have been considered both in Chapter 5 when discussing the statistical trends and in Chapter 10 when discussing agency co-operation and co-ordination. In addition, the Netherlands' experience of multi-agency sector Basic Education programme would suggest that agency co-ordination is still a major issue.

A key advantage of joint, multi-agency evaluations is that they have greater credibility and broader ownership among the broader agency community than would be the case within single agency evaluations. But the 'jointness' is rather limited. Danida say in their evaluation guidelines: "Common to most of these evaluations is that developing countries have played a minor role, if any, in their planning and execution". "A key challenge will be to secure local ownership for such evaluations by focusing on what is relevant for the partner" (Danida Evaluation Guidelines).

Deliberate attempts have been made to involve all stakeholders; for example, the Uganda Mid-term review of the country programme (1997). Similarly, the evaluations in several of the Indian States of the District Primary Education Programmes are held up as models of good practice involving representatives of both the Indian and State governments with adequate prior documentation.

There are positive signals about joint appraisals of sector programmes ... but this is not yet allied to a downward trend in bilateral appraisals. Many have signed up to codes of conduct such as Horizon 2000, but few have delivered.

11.6.2 Decentralisation and Development of In-Country Capacity
 

We have already remarked that it is rare for the agency to rely on the in-country capacity for monitoring, but there has been an increasing tendency to call for the building of in-country capacity for evaluation; indeed, for some this is one of the justifications for investing in higher education and training as part of their support to basic education (especially when sector programmes are considered). But the move towards decentralisation has a series of implications for monitoring and evaluation including:

in-country decentralisation means that the monitoring mechanisms have to be adopted by the local authorities in each country

agency decentralisation means that the in-country embassies are responsible for organising the evaluation (setting the terms of reference etc)

The first has progressively become an article of faith among development agencies, (Carron and Ngoc Chau 1995) and, in principle, one consequence of the decentralisation of monitoring and evaluation capacity, in parallel to the call for government or national ownership of the SWAPs, one would have expected an increasing reliance on local monitoring and evaluation capacity. Nevertheless, the latest (the eighth) Joint Review Mission noted that, among the eleven states, an EMIS was fully functional in four states, and in the DPEP districts in two other states; but in four of the states had hardly started.

But the practice is rather different and there are a number of obstacles:

recruitment and payment of local/national consultants on equal terms to the recruitment and payment of international consultants.

evaluations of sector programmes (whether or not SWAPs within the definition used here) are inevitably more difficult and extensive than project evaluations.

The first point is not straightforward. There is evidence that international agencies have in the past contributed towards serious wage market distortion by paying local consultants at international wage rates. While this would seem reasonable in terms of paying individuals on the basis of work done rather than nationality, it has in some cases resulted, ironically, in a lessening of local capacity, in the sense that some of the most capable individuals within government have been attracted out of government to work for international organisations.

 
11.6.3 Improvements in Quality
 
The World Bank provides an auto-critique of much of its earlier practice through the auditing of projects. A selection from these are shown in Box 11.3
 
Box 11.3

Addition of Student Loan Component to Secondary Education Programme "During preparation, little attention was paid to ensuring that the Student Loan Bureau had the capacity by implement the ... project" OED 99155, WB, 11/01/97

Yemen Education Adjustment Projects (Primary School Teachers) "... the Bank should have marshalled resources to ensure the collection of baseline data and the development of a monitoring and evaluation system" OED 99141, WB, 04/01/97.

Indonesia "... the seemingly inexorable way one project has followed another, no doubt reflecting a common interest in maintaining the flow of funds into the sector"

. ... one cannot discern a common intention by which to judge these operations..." OED 99015, WB, 10/01/97

 

Again, perhaps because of the Jomtien statements there is increasing emphasis on the assessment of quality rather than simply the expansion of the system itself. However this has tended to take the rather narrow form of assessments of learning outcomes such as the SACMEQ (Southern African Consortium for Measuring Educational Quality) which is based around achievement tests. Dissenting voices are heard from Norway (their Theme 4 is focused on documenting impacts of quality interventions) and UNICEF (focusing on the learning environment of the child). There is perhaps no clearer illustration of the extent to which there have been attempts to impose Northern interpretation of Jomtien on the South.

The basic problem posed for appropriate monitoring and evaluation is that any sector-wide intervention in the primary sub-sector is addressing the problems of very large numbers of geographically dispersed schools; the monitoring and evaluation has to reflect that and the Jomtien requirement - as interpreted by agencies - "to measure quality" just complicates the issue. Because data systems in developing countries are often so deficient, they sometimes cannot be used even to provide a denominator; and so agencies mount expensive surveys in conditions where it is difficult to obtain a simple head-count.

 
11.7 Conclusions
 
11.7.1 The Problem of Sector Evaluations
 

Moving towards sector programmes also means recognising the inter dependence of the education sector with the other sectors. Thus, whilst many projects have been concerned with, for example, the prevention of HIV/AIDS, and the rationale for the World Bank's substantial increase in aid to basic education was largely based on the presumed contribution of girls education to reducing (perinatal and) infant and child mortality, there has been no systematic attempt to integrate these other concerns (e.g. citizenship, employment, global responsibility, health behaviour and eventually health - parenting) into the design of monitoring and evaluation.

Some agencies have, of course set up units to synthesise evaluations across a wide spectrum but these also tend to ignore the wider benefits of learning.

Despite increasing sophistication in the deployment of both qualitative and quantitative methodologies, there has been only limited progress in designing baseline surveys before a programme starts and in setting up monitoring systems which can be operated by those in the field (e.g. teachers).

The essential conclusion is that there is a need, recognised by some agencies already, to shift the focus away from assessing the performance of agency-funded operations to the evaluation of government programmes. The need is not to build capacity along the lines of the traditional model, such as enhancing the monitoring and evaluation (M&E) skills of officials in the ministry of planning, but instead to make M&E part of the lifeblood of the sector ministries.

 
11.7.2 Assessing Performance at the Country Programme Level
 

Monitoring for outcomes and impact at the level of agency country assistance strategies is at a rudimentary stage. A number of agencies, including the UK, recognise the importance of doing this on a far more systematic basis. There is a need to match the effort made in preparing country strategies which may include basic education as part of an overall poverty reduction strategy with more work on measuring success against stated objectives. Even though many agencies' country strategies contain logical frameworks, indicators are poorly specified and rarely collected, and little priority is given by senior management to holding country programme managers to account against such indicators. Without such systems, the country programmes risk remaining strong on rhetoric but weak in practice.

Senior management will need to take a more proactive role in developing country-level monitoring systems (working with national governments), and to hold country programme managers to account against their strategy documents.

 
11.7.3 Does Evaluation Make a Difference

It is not only in debates about aid where there is widespread cynicism about the impact of evaluation and research on policy. For many, evaluations are carried out too late in the cycle and take too long for their conclusions and recommendations to be useful in policy deliberations. On the other hand, a number of agencies claim that "operation experiences (feed) into the development of new policies" (WB cited by Norway). One could argue that the World Bank evaluations have made a tremendous difference, e.g. Psacharopoulos.

Another important element here is the lack of visibility of many studies. In some agencies, 'reviews' are treated as internal documents which 'die' when they staff member responsible moves on. Hence the Knowledge Bank of the World Bank and UNICEF's Policy/Knowledge Network. Also institutions learn from each other.

 
11.8 Outstanding Issues
 

There is a real need to ensure that efforts by development agencies to improve quality and access to basic education centre on strengthening partner country systems for monitoring and evaluation. The main goal is to develop approaches which contribute to the systematic incorporation of performance data into the policy design and implementation cycle of the education ministry itself - and not the enhancement of data on individual agency-supported interventions.

But reliable monitoring systems can only be developed if data collection is accorded appropriate status within the hierarchy of each country's Ministry of Education. To be sustainable in the long term, this means that there have to be practical incentives given the attractions of the private sector for trained statisticians; and this will have to be true across all sectors.

Partner country capacity for evaluation and research into education and especially into basic education, has to be strengthened but there is a similar problem with the added difficulty that even institutions in the North have difficulty in retaining qualified staff especially given the implications of SWAps in terms of the much wider scope of any evaluation required and of developing appropriate models for cross sectional evaluation.

The complexity of sector evaluations has to be recognised. Whilst Joint Review Missions might be excellent vehicles for demonstrating co-ordination, it is unclear that they are the best way of carrying out very complex evaluations. Moreover, the sector analyses to date are suspiciously similar, as if a uniform template has been applied to very different contexts. On one level, there should be an ongoing review - perhaps by DAC? - of the methods and procedures used; on another level, it is vital to include voices from the South in any evaluation.

 
==============
Return to contents
 
[ Discussion Forum | Contact | Site map | Search this site | top ]