2. Finding aids and computerisation
Word
processing
Databases
Guides to individual repositories (Cook, Paris
1986 pp. 23-24)
Some examples of inter-repository guides
The purpose of this first main chapter is not to list all the various kinds of archival finding aid that have been computerised although some indication of the likely range of any such list will emerge. It is rather to present an overview of the ways in which the computer is compelling archivists to broaden their horizons in the management and exploitation of information about archives, and in particular to reconsider what they mean by the term 'finding aid' in the context of computerisation
Whilst some of the advantages and problems of computerisation are mentioned here in passing, more detailed treatment of those matters is reserved for chapter 3. Discussion of broader issues such as standardisation and authority control, which loom large in any assessment of computerisation in the field, is mainly postponed until chapter 4 together with aspects of current research and development including the creation of computer networks.
This division of the report's subject matter is designed to take account of the terms of reference set out in the Introduction. But it will be apparent that the issues are all closely related, that some of the examples adduced in one chapter might have been equally appropriate in another and, moreover, that very many additional examples for which there was no space might have been presented in evidence.
Computers could be regarded either as tools, in succession to the quill pen and the typewriter, for compiling archival finding aids of the now traditional kind, or as presenting altogether more radical propositions for the archivist and a whole new concept of finding aids.
Both views may be appropriate, and the former cannot be lightly dismissed. But it may be the ability or opportunity to see things from the second point of view that marks the real turning point for computer applications in archives.
On the whole, archivists were slow to respond to the challenge of computers. They lacked the resources to become deeply involved at an early date (when virtually all computer applications were on expensive mainframes), as well as the training to enable them to see the potential, alongside the problems, of computers. Their first steps were tentative and sceptical, often based on the premises either that the new technology was likely to have little bearing on their discipline (because in the very nature of archives it could not be used, as it was by librarians, for common cataloguing of identical items) or that any impact it might have would be a matter for the next generation (Fishbein 1981; O'Neill 1986).
Computer technology, however, has advanced at a pace beyond all early expectations and can no longer be ignored. For many archives the advent of affordable word processors (text processors), mainly during the 1980s, provided the first indications of how computers might transform traditional finding aids.
Here were machines that could quickly and painlessly accomplish feats of correction, substitution, spelling, layout and presentation beyond the capacity of the typewriter. The software was not of course initially designed with archival finding aids in mind, and in some cases archivists reported very basic problems, for example over clumsy word-wrap at the ends of lines, and inability to achieve a layout in multiple columns, or distortions caused by the machine's predetermined proportional spacing. But with persistence and experiment these snags were generally overcome, and it is needless to record in detail here the kinds of finding aid now produced by this means, for they are as numerous as those formerly committed to the typewriter.
The more serious problem of absorbing word-processed text into computer databases for analysis remains, and is considered in the next chapter.
Word processing continues to play an essential part in the production of detailed finding aids at lower levels of description (such as inventories or descriptive lists of items within a series/class) even in those archives with access to wider computer applications.
For smaller or less well resourced archives, word processing may for the moment remain the only available form of computerisation but it is not on that account to be spurned. It can result in significant economies in the production of finding aids, as well as a better quality of presentation. With the help of even simple office copiers, multiple copies can if required be economically run off from word-processed output, and if still larger-scale publication or dissemination is required the output may serve as camera-ready copy for a printer.
A further stage has been reached in the publications of many archives where, either by the insertion of codes (tags) at key points in the text, or by format recognition, the output of the word processor, generally on floppy disk, can be transferred automatically to computer typesetting machinery which will print a text that is typographically superior to the word processor's output (and more varied if required). This technology has been widely used in other disciplines also, and strenuous (though not yet entirely successful) efforts are being made, as in the USA with the Text Encoding Initiative sponsored by the Association for Computers and the Humanities, the Association for Computational Linguistics and the Association for Literary and Linguistic Computing, to agree and standardise conventions and mark-up languages to facilitate this process. Despite the fact that many machines remain mutually incompatible, numerous archival texts and transcripts, guides and handbooks, catalogues and inventories, have already been produced by this means.
When the British Library decided to computerise its Summary catalogues of manuscripts one of its primary considerations was that users familiar with the typography and conventions of the earlier volumes should not be confused by a new approach. The computer was therefore required to generate an end product looking as nearly like the previous volumes as possible. This has been successfully achieved using a commercial word-processing package, but with carefully controlled use of symbols including punctuation and carriage returns to enable the typesetter to recognise the format automatically and select the correct style and layout.
Although there continue to be 'dedicated' word processors (machines which can only perform this function)' word processing is now more commonly just one operation undertaken on computers of all sizes, which have much wider potential. It is the ability to fathom and use that potential which marks the transition from the first to the second way of looking at computers in relation to archival finding aids.
Even with a simple microcomputer it is relatively straightforward to set up databases to control information about archives. With the greater capacity of minicomputers and mainframes and through networking, a number of wider options may be available for databases, some of prodigious scope.
But some words of caution are necessary. As for any other computer application it is advisable first to identify the requirements and only then to seek the software and hardware that will fulfill them. The term 'database' covers a variety of different methods of storing and manipulating information. Considerable frustration may arise if a chosen system or package fails to match expectations, for example because it lacks the ability or the power to search or output the data in the form required or to cumulate it for wider use.
Archivists taking the first steps along this path should therefore be sure that either they or their technical advisers are aware of the range of 'hierarchical', 'network' and 'relational' databases on the one hand and 'free text' (inverted file) systems on the other - and indeed more recently of systems which combine these features - and that the chosen system or package will match their requirements.
Commonly, a database is a means of storing and manipulating by computer such information as can be presented in a structured form, ie where the various elements or 'fields' of information can be readily distinguished. Many of the elements of traditional archival finding aids precisely match this requirement (call number, date, description; originating entity, series title, quantity, etc.) and can easily be made the subject of databases. Moreover, where the fields are already clearly distinguished in the layout of existing finding aids, or are capable of being so distinguished by the insertion into the text of symbols or 'tags', it may be feasible by rekeying the data into the computer field by field, by encoding the word-processed file, or by automated means such as optical character recognition to incorporate pre-existent information into databases, and not merely to begin with new information assembled after the advent of the computer.
Of the database applications reported in response to the questionnaire, many cover those archival sources that are richest in topographical or genealogical information which is everywhere high in demand from the users of archives.
They have been created, for example, for census records (Public Record Office, UK), land tax (Ontario), land grants 1800-1948 (Sri Lanka), land surveys (Italy), registers of appointments (Malaysia), notarial records and correspondence (a number of places including Italy, Utrecht), building plans before 1900 (Singapore), etc.
In Switzerland the first steps have been taken towards the creation of a database which will contain information on some 2 million individuals with from 10 to 30 attributes for each.
Some more academically orientated projects covering a wide range of fonds were also reported, such as the database of records for the history of Hungary up to 1526 AD, and the large-scale collaborative project between the archives of Italy and Spain to identify source materials for the history of each country in the archives of the other.
Catalogues of 'non-text' media, where information commonly needs to be retrieved under a number of different headings such as originator, date or topic, are also proving popular for database applications. Among those specifically noted by the survey were catalogues of maps (Canada, UK), photographs (Malaysia, Singapore, UK, USA), audio visual archives (Hungary, USA, Zimbabwe), microfilmed records (Sweden) and electronic records (Belgium).
New concepts of archival information
Databases are entirely different from traditional finding aids. The information in them need not be printed out but can be interrogated at the computer screen by means of commands or queries. Depending on the software and hardware used, the computer may be able to output the entire content of the database, in printed form or on microfiche. It will normally also be capable of presenting that information in a number of different ways, for example re-ordering it chronologically, alphabetically or by subject matter. It may be capable of selective searches by field or key-word, and of indexing. These and other advantages are further discussed in the next chapter.
In other words a database can be seen as a single, integrated finding aid in its own right or as the potential source of a number of different kinds of finding aid (which need only be printed out, in whole or in part, if there is a demand), all stemming from the same input information.
A simple database application may have the short-term and finite purpose of manipulating information about, for example, a single series (class) of records in order to produce one or more finding aids for public or administrative use. A number of packages have been developed with this specific end in view, including MAIS in the Netherlands which produces preliminary and detailed inventories, indexes of names and subjects, and concordances (Archival Informatics Newsletter 1/2 (1987) p.20). In such a case the data may then be static, in the sense of never needing to be changed or aggregated with other information.
But computerisation may offer the wider vision of information which is dynamic, which can be stored and amended or added to on a regular basis and need never be the same on two consecutive days. In these circumstances perhaps it would be more appropriate to pose the question: when (rather than what) is a finding aid?
Database applications are as widespread among the specific kinds of finding aid required for modern records management and repository control as they are among those designed for public use. Many national and local as well as business and specialist archives now use computers to generate, or hold the equivalent of, their transfer lists, accessions registers, location lists, retention and disposal schedules and their tracking records for conservation, reprographics and other aspects of in-house management (cook' Paris 1986, pp. 17-19, 30-31).
One of the best known systems, because it was widely demonstrated to archivists attending the ICA Congress in Paris, 1988, is the PRIAM 3 application operating at the Centre des Archives Contemporaines, Fontainebleau. Described by its custodians as a référothèque, the system records the existence and location of all records transferred to the Centre and describes and indexes them in summary form, the description being deliberately kept to a minimum to enable computerisation and control of a great bulk of records to proceed according to plan. Selective subject indexing is undertaken, by means of which it has been found possible to draw together information about fonds now dispersed among the records of several transferring agencies, such as the archives of the Suez Canal company.
Computerisation, as we shall see in more detail below, is serving to break down many practical or conceptual barriers within archivistics. It is no longer necessary to regard the description of archives for the benefit of the user as a process wholly distinct from the control of information about the same archives for administrative and management purposes. This in turn is resulting in new kinds of database and new kinds of finding aid.
Integrated systems
A number of local and specialist archives, where the bulk of holdings is smaller than at national level, are already well advanced along the path of integrated control, with systems which can handle information about all their holdings irrespective of the storage medium, and include management functions.
One such system RAPIDE, in use at the Katholiek Documentatie Centrum, Nijmegen, Netherlands, has been described in ADPA, the journal of ICA's Automation Committee (Socket 1985). Other examples include the GAIA application in the French department of Seine-et-Marne, and North Carolina's MARS application.
The development of the USMARC: AMC format, discussed more fully below, has similarly enabled many archives in the United States to integrate the control and description of their holdings (see pp. 43-45).
The British Manual of archival description and its application in the Portuguese system ARQBASE (see pp. 45-46) offer similar options for integrated control.
A number of the larger national archives, where the bulk of records and information to be handled is very much greater, have similarly begun to implement or explore integrated or overarching computer systems which will tie together some or all of their hitherto separate computer applications and link all aspects of control and information. The proliferation of many separate systems and the initial absence of an overall strategy has proved a significant problem for management.
In Canada the Archival Holdings System will link information previously held in over 30 different computer systems throughout the National Archives.
In the United Kingdom the Public Record Office's Records Information System will bring together repository control and the data for the publicly available Current Guide to holdings, which includes administrative histories of the archive-creating departments and agencies and summary descriptions of holdings at group and class levels.
In the National Archives of the United States the Archival Information System (AIS) (see p. 16) will embrace series- and, where available, folder-level description of federal records and item-level descriptions of special categories of records including motion-picture film and machine-readable data sets. It will also draw upon information in the US Government Manual and the Guide to the National Archives, as well as information from existing inventories (captured by optical character recognition) and databases. It is expected that points of access to the new system will include agency history, biography, government function or specific program, and authorised topic terms.
Sweden has recently begun work on a national archival database to link the information contained in existing national and provincial databases, together with authority files, data on the changing geographical districts, and information gathered from archivists themselves.
Guides to individual repositories (Cook, Paris 1986 pp. 23-24)
The production of guides to the contents of individual archive repositories had been a priority in several countries before computerisation (see for example Van Driel 1984), but the process has been greatly facilitated by the computer. A number of respondents to the questionnaire (Netherlands, Hungary, UK) emphasised the ease with which summary guides could now be produced, and in some countries software has been specifically developed with this in mind, as in the ARCHEION application in the Netherlands (Horsman 1988).
At the same time there has been a considerable reappraisal of what a guide should contain. More emphasis than in the past is being placed on explaining the historical or administrative framework within which the archives were created.
In some countries what might be described as ancillary databases have been established primarily to supply information about the originators of the archives, whether institutions or individuals.
In Brazil, for example, the MAPA database contains, for each agency of government, its name, functions, details of predecessor or successor bodies with the same functions, and information about relevant legislation affecting its operation (Azevedo 1986).
Elsewhere the same kind of information is being fully integrated into computer-based guides (as in the cases of the UK and USA noted above) or is being provided for as a function of a database working towards a future guide of this kind ( ARQBASE in Portugal).
RINSE and ANGAM in Australia
In Australia computerisation has been taking place concurrently with the creation of a national archive service under the recent archive legislation, and the opportunity has been taken to develop further that country's distinctive approach to arrangement, classification and authority control (Scott 1966 etc; see also Australia).
In 1983, when under the Australian Archives Act it became mandatory for the Archives to make publicly available information about the archives which were open to inspection and to explain any reason for withholding such access, a new central data bank was established (initially on paper) for official use, known as the Australian National Register of Records. This set out to aggregate information contained in the records themselves, in official government publications, and from the personal knowledge of administrators and archivists about the institutions and agencies and also about the private individuals whose papers are held by Australian Archives.
A review of the processes for handling information throughout the Commonwealth of Australia led to the formation in 1983/84 of an ADP strategic plan. It was concluded that computerisation would offer a cost-effective means of handling such central information and fulfilling the new legal requirements.
Two main applications were established. The first is a Records Information System (RINSE) carrying, for all archives held by Australian Archives, details of the creating agencies, persons, organisation or families respectively and of the related record series. Microfiche print-outs of this data, with paper supplements for data not yet computerised, together constitute part I of the Australian National Guide to Archival Material (ANGAM). The second database, which forms part II of the guide and is therefore known as ANGAM II, gives the number, title, date-range and location for each series of records over 30 years old, together with a note of their availability in accordance with the Archives Act.
Overall access through the ANGAM index (on fiche, with paper supplements) is by title of the creating agency. The user is introduced to the provenance-based approach to the records by means of information leaflets. The database can be searched on-line using the item number or subject keywords.
Here, as elsewhere in national archives, there is no central index to item-level inventories, but any indexes, registers etc transferred by the creating agency are carefully noted. The Physical Control System (PCS) of Australian Archives, however, is not linked to the other computerised, systems.
One component of the Australian system which may be unfamiliar elsewhere as a form of finding aid is the registration. This is normally compiled at each of two levels: to describe (a) the record-creating agency/institution/person/family, and (b) the individual record series transferred to the Archives. The nearest equivalents elsewhere would be (a) a fords/group-level description and (b) a series/class-level description, but the concepts are rather different. The agency registration gives the agency's name, the running number allocated to it by Australian Archives, notes on its history including its operational dates, cross-references to predecessor and successor bodies and a list of the record series generated by that agency. The series registration gives corresponding concise information about the series and its relationship to other transferred records.
Inventories are provided, summarizing the available series, and at a lower level the items within each series.
Some examples of inter-repository guides
In a number of countries including the United Kingdom, Norway and Sweden, national registers of the papers of private individuals, families and institutions have been computerised,
The National Register of Archives (UK), established in 1945 and maintained by the Royal Commission on Historical Manuscripts, receives unpublished and published finding aids of all kinds from record repositories and from private sources, relating to archives and papers of interest to British history. The finding aids themselves are held in hard copy but summary information about them is now held on-line in a computer database which includes details of the collection or series title, its location, and indexes of persons, companies and other record-creating entities including churches, hospitals, schools, trade unions and families. As with the Swedish National Catalogue of Private Papers, the levels and structure of indexing and description for national purposes are quite independent of those adopted by the originators. The same relational database holds information about Britanian's record repositories, their addresses and telephone numbers, the archivist in charge, opening hours, and facilities. Data from this file will be used in the creation of the forthcoming 9th edition of the Commission's directory Record repositories in Great Britain.
The largest computerised, inter-repository guide is probably the second edition of the USA's National Historical Publications and Records Commission Directory of archives and manuscript repositories in the United States (Oryx Press, Phoenix, 1988).
This covers over 4,500 institutions and in addition to the kind of information in the British directory includes brief details of each institution's holdings. Its production encountered problems, including the lack of an inter-active mode for correcting and updating files and the hardware dependency of the SPINDEX application (see the Directory's introduction) but it is still an impressive work, made possible in this form largely by computer technology.
Also in the USA, the National union catalog of manuscript collections maintained by the Library of Congress is another example of an inter-repository guide that is now being greatly facilitated by computerisation Although to many libraries it is still best known in its printed form, its data is now entered into the on-line computer network RLIN discussed more fully below.
At a more local level the city and county archives of Stockholm, Sweden, has developed a regional catalogue covering all the archives in the county.
Individual scholars and researchers, academic institutions and learned bodies, and even pressure groups, are also widely engaged in the creation of computer databases about the archival sources of special interest to them. These are not 'archival finding aids' in the accepted sense of the term, and indeed may never have been near an archivist at any stage in their preparation. But they are certainly a force to be reckoned with and will be widely used as points of access to archival information.
In the United Kingdom, for example, nationwide databases covering the contents of many archives as well as records in private ownership have been or are about to be created for hospital archives, medical archives, papers of senior military and naval staff of the 19th and 20th centuries, sources of art history, business history, garden history, computer history, and English literary manuscripts from the 18th to the 20th century.
To summarise, then, very few kinds of finding aids from the pre-computer age cannot be reproduced in computerised, form, but more importantly the computer has opened up a new range of possibilities for the handling of archival information of all kinds. The benefits as well as the problems are assessed in the next chapter.
Figure. The United States Archival Information System