An introduction to archival automation

Contents - Previous - Next

Michael Cook

INTRODUCTION

This publication is intended as a preliminary guide to those who are considering whether or how they can introduce any of the techniques of automation into the administration of archives and records services. It gives information on the ways in which computer applications have been developed to assist with these processes, and suggests where sources of additional information may be found. It does not itself set out to give information of sufficient technicality or detail to enable a system to be set up, but instead it aims to provide a tool by which archivists and records managers, working in manual or non-automated services, may consider the value and functioning of automated systems, and make use of the experience of their colleagues in more technically advanced services.

Archives and records services form an essential part of the information management services of a country. They deal with information-bearing materials generated within the administrative systems of important organisations (whether governments or private institutions or organisations while on the whole, library and documentation services deal with information-bearing materials bought in from outside. To play their part, it is essential that the archives and records services should operate efficiently within the limitations of available resources. This is all the more true as these services come under greater pressure because the increasing volume of records produced coincides with the expansion of demand for access to the information in them. Modern records and archives services have to deal with vast amounts of material, and have to find ways of exploiting their information content rapidly and accurately. They have to do this within the constraints of a budget which, never generous, has probably been subjected in recent years to new restrictions.

In this context, archives and records services cannot ignore the potential of automation. The additional power which computing gives to the management of any information service is likely to be of great - or even vital significance to the success of that service. Computers can speed up the processes of collection, handling and retrieval of information, and can also extend the range of information supply and use. Introducing computers may solve some of the problems of carrying out processes in the service (especially those which depend on repetitious clerical work), and may help the service to be of more obvious and immediate value to its users.

Additionally, archives and records services cannot ignore the inevitability with which various forms of automation are coming to dominate administrative methods. The advance of new technology may be obscured and delayed in some developing countries, but its eventual coming is clearly charted. Elsewhere in the world, the new ways are being adopted rapidly. Information workers have to face a double challenge; they must themselves learn to use automated systems, because these are becoming standard, displacing more traditional systems; and they must learn to deal with the documents which have been produced by other people using the new technology, because these are the documents of the present age. Archivists must not only look at more automated ways of running their service and producing their finding aids, but they must now think of surveying and managing the records which have been created in machine-readable forms.

The advantages of automation have a cost, both in financial terms and in terms of change in the methods of work and attitudes of the staff. In fact, if the true costs of running a manual operation are calculated (and in the past these costs rarely were), and the true costs of introducing electronic methods are compared with them, the changeover is usually not found to be necessarily very expensive. However, it is necessary to invest in new equipment, and this equipment needs infrastructural services and maintenance.

The systems which are to be used must be planned carefully, because if they are not well suited to the jobs which are to be done, they will not succeed. When computerised systems are in use, staff members will have to learn to do their jobs in ways hitherto unfamiliar. It may well be that in the long run automation will reduce the amount of routine work that has to be got through, by all grades of staff. However it is quite clear that it does not of itself reduce the need for numbers of staff, especially professional staff: what it does is increase the productivity of the staff, and so make it more likely that the service will be viable.

This manual supplements and to a degree replaces the earlier introduction to archival ADP, by Arad and Olsen, which was published by the Automation Committee of the International Council on Archives in 1981. A new manual was needed because of the rapid development of automated functions and services, and especially because of the following developments.

1. The wide distribution of cheap computers, particularly personal microcomputers and integrated office systems.
2. The enormous expansion of storage capacity available to electronic machines. This has changed their basic function from being machines which process data held on paper to being machines on which information is originated, transmitted and then stored. A modern electronic system has in principle no need to use paper at all as a method of retaining information or putting information before its users.
3. The ready availability of software packages. Computers cannot be used unless they are suitably programmed. In the past, this programming had to be done by hiring programmers to devise a system for each service. Today it is easier to acquire a ready-made set of programmes, which carry out the standard functions. There are many examples of archives and records services which have used these packages, with or without adaptation. Information processing packages are frequently available without much financial investment.
4. The development of information retrieval methodology: this has meant that users in many countries are now accustomed to getting their information by accessing computer systems online. In other cases automatically constructed indexes or keyword retrieval has changed the way in which users are guided towards relevant material.
This has changed user expectations, as well as giving new technical tools to information managers. In the most advanced countries, networks carrying bibliographical and documentary information
(including archival information) are already in daily use, and descriptive formats are being developed to allow more information to be fed into these systems.
5. Other factors which have changed the climate of opinion and the way in which the information professions think and work have included the recent development of new data storage media, such as optical data disks, and sophisticated systems for displaying graphical information, and for transmitting documents in facsimile. Finally, we may note the growing movements for the harmonisation of training between the information professions.

In writing this manual, an effort has been made to use the simplest possible language. Where technical terms have been used, they have been drawn from the Automation Committee's Elementary terms in archival automation (1984).

References in brackets in the text are to publications listed in the bibliography.

GUIDELINES

Electronic systems can be used to carry out all or most of the work which is done on data, that is upon items of information. Much of the work which is done within an archives or records service can be described as the processing of data. The data involved can belong to one of two kinds:

(a) data which is needed to deal with the archival or record material itself, considered as a mass of physical objects;

(b) information derived from the archival documents themselves.

An example of data about the physical material would be information needed to control the processes within the service. An accessions register contains information about consignments of archives received, and a work control system records when work is done on the materials (fumigation, boxing, repair, production for users, etc). This area is an archivist's work has been termed 'administrative control'.

An example of information derived from archival material would be a finding aid consisting of descriptions of groups or series of archives, or of individual documents. Other examples would be indexes or guides to the material. In some cases the is a need for calendars, or full-text transcription, of the wording of important documents. This area is referred to as 'intellectual control'.

Both kinds of data have to be generated or collected, and then processed, by the staff of the service. In this respect, archives and records services are not different from other kinds of organisation, therefore they can consider using some of the techniques for processing data which have been developed elsewhere. In fact, this observation is true as regards manual methods as much as electronic methods; but it is the appearance of computer systems which has stimulated archivists to look closely at the methods which have become popular in other types of institution.

Studying other people's methods means experiment. Fortunately this is no longer so far out of reach of the less well funded organisations and individuals as formerly. The many cheap and self-sufficient personal computers, with their software packages, which are now available, have spread a knowledge not only of computer operation but also of what can be done to handle data. Where small computers are available cheaply, the public experience can also be valuable to those considering a new system at work.

Some problems and their solution

Problems centring on hardware:

Computer systems need not only computing power but also an array of devices for inputting data and getting reports on processed data. These are termed peripherals, or input/output devices. Computing power can be provided either by a large central computer (a mainframe), by a single small computer (mini or micro), or by a linked network of small computers (microcomputers; now increasingly becoming known as personal computers). Peripherals would include a device for processing input data, and a device for reading the data held in the computer's memory. Both these jobs can be done by a terminal set up in a convenient place. This terminal may itself be a microcomputer. Finally, access to a printer is required.

It is not possible to advocate a single "best solution", because so much depends on the relationship between the archives or records service and its employing authority. If the service has an active RM programme, it will wish to be linked as closely as possible to the central administration. The records manager should be one of the participants in the central data base management system, or should be involved in the office communication system. The nature of this participation will no doubt determine the kind of equipment chosen. On the other hand, if the archives service functions mainly as a quasi-independent research institute, linked more closely with other such institutes than with its employing agency, it may make more sense to develop an independent computing facility.

A general principle might be that where the service is very closely integrated with a larger organisation, it may be more convenient to use the computing and data processing facilities offered by the central computing unit. Where there is no such close integration, modern microcomputers, with their large storage capacity, can serve as good and cheap data processing units. Probably the direction in which new technological developments are going suggests that independent computing (ultimately developing networks of compatible independent computers) is likely to carry important advantages. The overriding consideration in making a choice between the two main alternatives must be the service's own requirements and objectives. Forcing these into the conditions of a not entirely suitable computing service will inevitably lead to distortions.

Supporting and maintenance services are important. If small defects appear, or problems of a technical kind, archivists should be able to get expert advice, or technical remedies without delay. These are usually through the central computing service, so that a good relationship with this department is highly desirable. A maintenance contract with an efficient firm can be as valuable.

The question of obsolescence should be considered. No computer has as yet been used to the point of breakdown through old age. Consequently no-one knows how many years a given machine will last. Obsolescence has been a more important factor than wear. It is certain that anyone who operates a computer, small or large, will begin to think about replacing it after about 4-5 years, if not before. It will then be necessary to transfer the data and software to a new machine with the minimum of trouble. There is generally no overwhelming technical reason why this transfer should not be made, if the necessary questions are asked when the new equipment is being negotiated. but the existence of this problem does suggest that equipment made by a large manufacturer has an advantage.

Transferring data files from one machine to another is another situation where standard formats are useful. ISO 8211 lays down norms for such a format, and therefore could be valuable in ensuring the continued life of a database.

Compatibility is always a problem. As time goes on, other services departments of the employing agency, other information services, or parallel archives services - acquire computers, or new computers. The new machines may not be apt for the development of networks or linkages. Since it is most likely that future communications systems will depend on these linkages, it is important that the question of compatibility be borne in mind when choosing hardware, and that the archives service is consulted when other related units get new material. For the same reason it is desirable for professional associations to examine the possibilities for future planning of automated systems in their areas, and making recommendations.

Problems centring on software:

Software problems are likely to be more difficult than hardware ones, since with the latter simple questions of finance and availability are often dominant. Software can certainly be expensive, but it is of its nature much more portable, and questions of suitability to the needs of the service are vitally important. The software capability of the system must suit the objectives and structure of the archives service it is brought into, or there will be a failure.

Some software choices are determined by the make and kind of hardware selected. Mainframe manufacturers usually offer a range of packages when they make a sale. Archivists who have access to central computing services could therefore begin by making enquiries as to whether suitable packages are already held by that service. Administrative or financial computers may turn out to have information retrieval or bibliographic software installed which is not needed by the principal users. This situation has often been encountered by, for example, local archive services in Britain (Patch, 1979). In the same way, it is often possible to get software packages included in the sale of microcomputers. Examples already quoted mention microcomputer users who have adopted packages such as dBase3, Delta or Cardbox Plus. These packages are very widely available off the shelf and can be operated immediately (on appropriate machines) without any preparatory programming.

More usually, archivists seeking to automate some of their processes will look about for packages which might be suitable. Many of those available have already been mentioned in section 4 above. They seem to fall into two groups: bibliographical or information retrieval packages and database management systems. Each has advantages and disadvantages. The evaluation of software systems was discussed above in Section III.

Bibliographical systems have the advantage that they are often cheaply and easily available, and, since they are widely used by library and documentation services, it is likely that they could be used for co-operative projects. On the other hand, there is the central problem that they are designed for item-by-item listing. It is important to be certain that the field structure used by the package is adaptable to the needs of archival description and would not impose arbitrary restrictions. The other important question is the range of formats provided for output information.

The bibliographical package FAMULUS is used by several archives and museum services (Bartle & Cook, 1983). Developed for academic use, it is often available free to educational or research bodies and it has been recently (1985) redesigned and updated. It allows a range of data structures, which would be suitable for many archival functions. Each record can be divided into up to 25 variable-length fields, and each field can contain up to almost 5000 characters without adaptation. This would mean that free text narrative administrative histories would probably not present a problem. In the output area, FAMULUS is not so flexible. Output in printout form is provided only three alternative formats: as a numbered list of items, as an index grouped under the data in selected fields, and as a formal list set against field names. The package is essentially aimed at producing this kind of output, but it does also provide for online searches, of selected field. FAMULUS was used to generate the specimen index displays in Fig.12.

Database management systems (DBMS) are even more widely available, since they are being promoted for the administration of small firms. DBMS are packages which allow different kinds of data to be stored in differently structured files, so that it can be retrieved on demand and displayed in various combinations. The most interesting variety of these packages, Relational Database Management Systems, allow data in one file to be displayed with data in others and used to show significant factors when these relationships are brought into play.

Like bibliographical packages they are not necessarily adaptable to archival use. When testing a package for its suitability, it would be necessary to be certain that it will allow lengthy textual entries (some are limited to numerical data, or very short textual entries), and that it will permit searches of these entries. The ability of these systems to calculate from figure held in numerical files may be an unnecessary feature for archivists. The DBMS know as SQL/DS (Structured Query Language/Data System) available on IBM computers is used for records management at Liverpool University. It allows the user to compile and structure as many different files as necessary, each file can contain a very large number of records, and these may have as many fields as are needed. Data held in different files can be combined and there is a very powerful search capability which operates by selecting data items, combining them, and outputting them in a format which can be determined on screen. The corresponding disadvantage is that, since it is not designed for holding lengthy text, it is necessary to give specific instructions each time such text is retrieved. Specimens of file structure in this system are given in Figs. 5-7

Archivists must think carefully about the methods to be used for inputting data to a system. The STAIRS information retrieval package, again widely used and available on many makes of computer (Cook, 1986, p.109) has the disadvantage that data entry is relatively difficult; it needs some extra programming, or the addition of other software to make it user-friendly in this respect. In fact it is generally necessary to add local facilities for data entry and user convenience when installing software packages at a local site. Consequently it is important to consider what local resources for this are available, and to what extent local systems can be maintained.

Obsolescence is also a problem with software. Packages are constantly being updated: it is now normal for a version number to be given with a package when it is bought. New versions will incorporate improvements or extensions to new hardware, but will not necessarily be directly compatible with the older versions. Local computing services introduce their own adaptations and improvements to software, which may make their private versions of the package impossible to use elsewhere. Although in principle software incompatibility can be overcome by further software work, this is often too expensive or not possible locally. Making a choice of software therefore implies a judgment on the future compatibility of the system chosen.

The standards proposed by ISO 8211 can be used as a partial insurance against problems in software transition.

Staffing Problems

There are two aspects to the staffing problem: attitudes and training.

(i) Attitudes:

It is most important that the introduction of automation should not be undertaken against the feeling of the staff in post, especially the professional staff. It is in fact most unlikely that bringing in computers will result in the reduction of professional staffing levels, so that fears of redundancy may be allayed after explanation. However, it is certain that automation will change habits and methods of working, and the staff have it in their power to prevent the full achievement of the objectives of a new system, if they wish to withdraw their active co-operation.

The usual way to avoid this kind of misunderstanding is to undertake changes only after consultation with the professional staff. The new project may then emerge as the result of that consultation, and in the form of an agreed plan. Additionally, there will probably be scope for those members of staff who are particularly interested, to take on new responsibilities in carrying out the initial analysis, and in supervising the introduction of new methods.

(ii) Training:

Questions of curriculum and level were discussed in the first part of this study.

The Automation Committee's international survey in 1985 enquired into the provision of training for staff involved in automation projects. Most replies indicated that archivists got their training by self-education: by reading books like this one, discussing technical questions with experts and doing what practical exercises they could find.

Another common form of training was by attending courses provided by computer manufacturers or computer marketing firms. Such courses are widely available, but they have the disadvantage that they are (naturally) geared to explaining the powers of one particular manufacture or system. Such courses may be very valuable (and when it is a question of training operatives for running a system which has been installed, indispensible, but they do not take the place of a broad training which will develop the students' power to examine systems critically and in comparison with each other.

In a few other cases, there were courses in computing available in institutions of technical or higher education, which staff members could attend. This too is a valuable resource, but such courses are often too much directed to the needs of scientific research, or of business administration, to be directly relevant to information workers.

Some archives services began their automation by recruiting new staff, with a requirement that candidates should have appropriate experience.

It is clear that in general the question of staff training has to be considered as one of the important matters involved in bringing in automation, and resources should be allocated to deal with it.

Costs

Introducing automation will involve capital and recurrent costs, but these may be set against savings on manual systems. The process, therefore, begins with an analysis of the costs of current manual systems. Unfortunately, it is not easy to estimate these accurately and there appear to be few or no published studies (Cook, 1986, pp.48-52).

Capital costs cover the purchase and installation of hardware and the initial purchase of software. Capital expenditure is minimised where there is maximum use of central services. Where there is strong support from these, expenditure in the archives service itself could be as low as the equivalent of US $4000 to cover the installation of a terminal, printer and modem to operate in connection with a mainframe computer already available. Alternatively, an independent microcomputer system would not necessarily have a higher cost than this.

Recurrent expenditure is largely on maintenance of both hard and software and on consumables such as stationery. There may be a charge for computer time in cases where the repository is using a central service. None of these costs is likely to be notably higher than corresponding costs in a manual service.

It is common for computer systems to be installed as a result of a specially funded experimental project. If a supporting agency can be found, this is an excellent way to undertake automation, for it allows for buying in expert analysis and supervision without taking on long-term commitments. There should, however, be a plan for maintaining the system after the end of the project.

Problems connected with systems planning

(i) Objectives:

The essential difficulty is that without a clear view of the objectives to be met by the system, archivists cannot proceed to plan or acquire any equipment; yet without an idea of what equipment is available, it is difficult to come to a decision on the ultimate objectives. a) Type of output or access envisaged; printout of inventories; printout of search results, or of specialised handlists; online search by staff or by users; remote use? b) Relationship with central services and with other information services; co-operation or networking? c) Data and sources to be included, and method of data capture and processing.

(ii) Technical Questions:

a) The bulk of data to be processed and stored.
b) The length of time automated data is to be held. Long term storage will involve setting up some form of data archive, and/or provision for transferring files to future new systems.
c) System security: making sure that there is no distortion or loss of data by unauthorising access or by improper processing.

(iii) Planning for the Future:

Eventually the existing installation will become out of date and both hard and software will have to be replaced. The new system should be an enhancement of the old, but should be able to use the databases compiled over the years.

Enhancement of the system will certainly involve questions of co-operation with one or more of the following:

a) The central administration of the employing agency: the introduction of electronic communication systems will involve records and archives management.
b) Information services operating in the locality: the likelihood of a Local Area Network (LAN) being developed.
c) Information services operating nationally: national networks and registers.
d) User groups, especially those organised in institutions of scientific or academic research.

Automation in small, poorly financed archives services and in developing countries

Provided that the technical infrastructure is present (as described in section 1), there is a very strong case for introducing automated methods into small and underfinanced archives and records services. The case would be strongest where the service has only one professional archivist or records manager, with minimal supporting staff. In this case it is important that the professional's output should be maximized, and that his or her control over all the processes of the service should be maintained. Automated methods are the way to ensure this, provided that time can be found for the initial planning. As indicated in Section V, the investment in financial terms is not by most standards a major one.

A good example of self-help and the intelligent use of local resources is that of the South Humberside Area Record Office in England (Bartle & Cook, 1983, p.35). This system is based upon a popular Commodore 64K microcomputer with a cheap Centronics printer and double disk drives. This means that the initial capital expenditure was probably less than US $500. The programs were written by the archivist in charge, using a variant of

BASIC, supplied with the machine, together with user manuals, at the time of purchase. Initially the data input consisted of descriptions at series level, structured into 7 fields (reference code, title, dates, provenance, accession date, location and notes). Later item-level descriptions were added and a simple search facility brought into use. Since data was held on 162K diskettes, initial sorting of these had to be done manually, but a change to hard disks would avoid this need. Remote access was planned at the administrative headquarters some miles away. The objective was primarily to improve the productivity of the small team, who were being overwhelmed with new accessions of material which could not be listed fast enough to keep pace.

The same principle operates in developing countries. Archives and records services here are likely to have the advantage that they are attached directly to central government and so are close to sources of decision in information, finance and technical services. The pioneer was probably the Archives Nationales of the Cote d'Ivoire, which undertook an automated system for retrieving information from the archive of official bulletins. The system was introduced in 1975 and ran for some years. A terminal linked to the central government computer was used, employing the information management package MISTRAL, which was available on the machine.

The National Archives of Malaysia has a pilot project now operational, using a Canon AS-100 microcomputer and a standard software package, dBase2. The pilot project is to compile a database of national pension records; that is, important administrative records which have to be managed and retried reliably. It is proposed to extend the system next to group/collection level descriptions. An active programme for training specialists from among the professional staff is in progress.

Automation is under active consideration in several areas of the southern hemisphere, notably in Latin America (for which the ICA Automation Committee ran a training workshop in 1985), South-east Asia and the Pacific countries. Professional colleagues in these regions recognise, in broad terms, the benefits which well designed automation projects will bring, but find that infrastructure and training are constraints. Comparative measurements of productivity using automated methods and manual methods will usually show that cost should not be an important factor.

Computers have provided the first important new tool for archives and records management since the invention of the typewriter a century ago. Some archives initially resisted the new methods because they felt that they were inappropriate; some because of the cost of reinvestment. We are now at a point in history when it is possible to see that the new methods are very relevant and appropriate to all forms of information management and that the cost of running them is likely to be comparable to the costs of manual methods. The new ways do indeed demand that archivists should rethink their aims and strategies; but this is good in itself and leads to a new vigour and enthusiasm among a professional staff given new goals and a new stimulus to achievement.


Contents - Previous - Next