The archival appraisal of machine-readable records

Contents - Previous - Next

Harold Naugler

INTRODUCTION

"No other development since the invention of movable type has had an great an effect on the production, dissemination, storage, and use of information as has that of the electronic computer, and this development has only been in process for about thirty-five years. Compared to the other great inventions in information communication (writing, begun about 5,000 years ago; the alphabet, developed some 3,000 years ago; movable type, invented about 700 years ago), the computer is only in its infancy." (1)

What is it that makes the modern electronic computer such a powerful tool in the world today? First of all, electronic computers operate at speeds which are hard to imagine. The time required for the internal operation is measured in nanoseconds (that is, .000 000 001 seconds). A corollary of this speed of work is the volume of work which a computer can do. Examples abound in both industrial and scientific applications where computers are being used to solve problems which would have been insolvable by any practical means because of the sheer volume of calculation involved.

A second characteristic of electronic computers is the consistency with which they carry out their instructions. Machine errors are almost unknown, and because of comprehensive error detection systems they seldom lead to inaccurate results. Most of the errors which are reported with such glee in the press are in fact the result of human error rather than the fault of the machine. If the information fed into the computer is valid and the programs are sound, the machine can be relied upon to produce the results that are required.

A third invaluable characteristic of electronic information processing systems is their great storage capacity. Modern computers can store vast amounts of information in a relatively small space, and in such a way that it can be retrieved and used very rapidly. This great storage capacity is particularly advantageous in applications such as calculation of census data, where very detailed information can be processed in a relatively simple way.

A further advantage of the modern computer is its versatility. For example, the same machine can be programmed to help a company's accountant produce a payroll, help the sales manager analyse a market research report, and assist the company's architects and engineers design a new building.

Another important aspect is the fact that the programming and processing tasks are independent of each other. The machine can be working on any one of a variety of tasks while the computer personnel are preparing a program for yet another piece of work. The machine has the ability to accept detailed instructions and to store these in a high speed internal memory unit. It then has almost immediate access to these instructions, and is not dependent upon an operator feeding in instructions as the work progresses.

At the same time that electronic computers have been developing, the older business systems have been running into difficulties. Some of the problems which these older methods of work have had to face are outlined below.

1) A growing volume of paperwork. As business and government have become more and more complex, more and more records and reports have been needed. Each organisation has needed to keep more detailed information on all aspects of its operations.
2) Increased costs. As the standard of living has risen, so has the cost of employing labour. This has forced all types of organisations to consider automatic methods of processing information.
3) Shortage of personnel. Again as society has changed, specialisation has increased. A better educated population has led to a decline in the number of workers available for routine processing work.
4) Elimination of error. In this increasingly complex age, it becomes more and more important that we do not make errors. On a flight to the moon, for example, it is imperative that the results of all calculations be reliable, and that no human transcription errors be made.
5) The need for rapid decisions. Modern management wants to know what happens as soon as it happens so that the managers can make sensible decisions at the earliest possible moment. In this way, potential business will not be lost because of ignorance on the part of management.

Considering the technical ability of modern computers and the difficulties facing the older methods of information processing, it is not surprising that computers have been introduced into laboratories, business, and government offices throughout most countries. In part these machines are replacing older systems, while in part they are being used to undertake work which could not previously be done.

How have archivists responded to the confluence of converging computer/communication technology, new legislative and management initiatives, the rapid growth in the use of computers, and the explosive growth in the volume of information in machine-readable form? It was events of this nature which led, at least in part, to consideration of the implications of computers by the Fifth International Congress on Archives in 1964 and a year later at the Ninth Meeting of the International Conference of the Round Table on Archives. However, at that time few ICA members foresaw the possibility of accessioning machine-readable records. Seven years later, in 1971, at the Thirteenth Meeting of the Round Table, data processing applications and their implications in archives were examined. It was as a result of the report that the Ad Hoc Working Party on the Implications of ADP in Archives was established by the ICA in 1972. The Working Group was the predecessor of the existing Automation Committee of the International Council on Archives. "The deliberations of the Working Party and later, of the Committee led to exchanges of views with regard to the use of computers for managing archives and the problems of appraising machine-readable records." (2) It was around this same time that a number of national repositories - in Canada, Sweden, the United Kingdom, and the United States - began preparing for the scheduling of machine-readable records and for the acquisition of those appraised as having long-term value.

However, concern for the preservation and use of machine-readable records was not, and is still not, confined to traditional archivists. In 1973 a new international organisation was established known as the International Association for Social Science Information Service and Technology (IASSIST). Membership in the Association consisted basically of three groups: the creators and disseminators of machine-readable data, data archivists and data librarians, as well as the users, particularly social scientists, of such data. The data archivists and librarians were representatives of social science data archives which were being established at academic institutions throughout many countries. (3) Although data archivists and librarians do not always have the same background and training as their traditional archival counterparts, both share many of the same concerns with respect to the management of machine-readable records. One particular area in which IASSIST members have provided considerable leadership is in the cataloguing and description of machine-readable data files. (4)

It is interesting to note that much of the early literature written concerning machine-readable records dealt with the crucial question of appraisal. (5) Indeed, this continues to be a topic of considerable interest, discussion, and re-evaluation among archivists who have been dealing with machine-readable records for over a decade. It is, therefore, most timely that the Division of the General Information Program of UNESCO and the International Council on Archives have agreed to the joint sponsorship of this particular study.

Archivists who manage machine-readable records on a full-time basis quickly recognise that procedures which are developed one way may require partial or complete revision in two or three years. This is often necessary in order to keep pace with the many and frequent changes in the computer industry itself. Not only is this the case for the accessioning, processing, and preservation of machine-readable records, but it is also true for the appraisal function. For example, as machine-readable records become admissible as evidence in courts of law throughout various countries, the appraisal of machine-readable records from a legal point of view will become far more important than it is at the present time. As more and more textual information becomes digitised or machine readable, it will also be necessary co reassess the evidential value of machine-readable records. In other words, the author does not consider the approaches outlined in this study as in any way definitive. While every attempt has been made to reflect the "current state of the art" with respect to the appraisal of machine-readable records, it must be recognised that developments will occur over the years which will necessitate their reassessment and possible revision. The approach should, therefore, not be interpreted as definitive, but rather as a guideline to archivists who manage machine-readable records.

FOOTNOTES TO INTRODUCTION

1. H Thomas Hickerson, Archives and Manuscripts: An Introduction to Automated Access. Basic Manual Series, Society of American Archivists, Chicago, 1981, page 11.

2. Meyer H Fishbein, Guidelines for Administering Machine-Readable Archives. Committee on Automation, International Council on Archives, Washington, D.C., November 1980, page 7. This particular publication is an excellent example of the work of the Automation Committee over the years, and particularly some of its members, in addressing problems associated with the archival management of machine-readable records. Committee members have also devoted a great deal of time and attention to the use of computer systems in archives. See, for example, A. Arad and M.E. Olsen, An Introduction to Archival Automation. Committee on Automation, International Council on Archives, Koblenz, Federal Republic of Germany, January 1981. The Committee also produces a journal, ADPA, which contains articles, etc. on both automation in archives and the management of machine-readable records.

3. For an explanation of the reasons for the establishment of such archives, particularly in the United States, and the various functions performed in such institutions, see C. Geda, "Social
Science Data Archives", The American Archivist, Volume 42, Number 2,
April 1979, pages 158-166.

4. See, for example, the manual written by Sue A. Dodd, Cataloguing Machine-Readable Data Files. American Library Association, Chicago, 1982.

5. Meyer H. Fishbein, "Appraising Information in Machine Language Form", The American Archivist, Volume 35, Number 1, January 1972, pages 35-43; L. Bell, The Archival Implications of Machine Readable Records. Washington, D.C.: VIII International Congress on Archives, 1976; Charles M Dollar, "Appraising Machine-Readable Records, "The American Archivist, Volume 41, Number 4, October 1978, pages 23-30; C L Geda, C W Austin, and F X Blouin, Jr. (eds.), Proceedings of a Conference on Archival Management of Machine-Readable Records, Held at the Bentley Library, the University of Michigan February 1979. Society of American Archivists, Chicago, 1979.

6. GUIDELINES

6.1 This chapter is intended to provide readers with summary conclusions which are written in the form of recommended policies and practices, or guidelines. The numbers in parentheses refer to specific paragraphs of the study which contain a more detailed explanation of the subject(s) covered.

6.2 Archivists who are to be responsible for machine-readable records must become familiar with the basic terminology associated with data processing as well as with the operations of a computer system (1.2 to 1.41).

6.3 It is also important for archivists to be familiar with the nature of machine-readable records and how information in machine-readable form differs from other kinds of information, such as textual records and microforms (1.41 to 1.45). Machine-readable records have certain unique characteristics which must be known (1.46 to 1.49), as must the sources (1.50 and 1.51) and uses (1.51 to 1.55) of such records.

6.4 It is possible that some archival institutions may be unable to deal with machine-readable records because of limitations imposed by statutory or other regulatory authorities. This can include restrictions on the type or kind of records which the archival institution can acquire, as well as restrictions on the acquisition of recent records. There are a number of ways in which archival administrators can resolve these particular problem (2.3 to 2.9).

6.5 A number of issues arise when appraising machine-readable records with which archivists must be familiar. One is the existence of data in central government agencies which are often the compilation of data created by other government jurisdictions, with no indication as to what governmental level owns the data and controls access to the data (2.11 to 2.14). A second issue is the control of machine-readable records that are created as a result of government contracts or research grants (2.15 to 2.23).

6.6 It is crucial for EDP records management programmes to be established in order for archival repositories to be assured of having a systematic acquisition programme for machine-readable data. In this way archivists can properly identify and appraise the machine-readable records that are created in the particular jurisdiction in which they work. While one of the major rationales used for a traditional records management programme has been the savings that can be achieved by storing voluminous quantities of records used infrequently in low-cost storage sites, cost-benefit analyses for EDP records management are still in the infancy stage (2.26 to 2.36).

6.7 In traditional records management policy and procedures, disposal plays a major role. However, such is not the case in the EDP world for, left on their own, those who control computer systems would automatically delete unwanted or unnecessary information. Because of this, it is imperative that records schedules for machine-readable information be established at the system design or planning stage for new applications or programmes (2.38 to 2.51).

It is also important to remember that the archival limitations for information in machine-readable form may often be different from those for paper records (2.52).

6.8 Archivists will continue to work with records managers, at least to a certain extent, with respect to the scheduling of machine-readable records. However, it is the EDP personnel in the creating institutions with whom the archivists will need to work on a regular basis in order to ensure that machine-readable records are properly identified, inventoried, and scheduled.

6.9 The appraisal of machine-readable records involves the evaluation of the information contained in the records (content analysis) as well as an evaluation of the technical aspects of the records (technical analysis). The content analysis involves the traditional activities of archival appraisal combined with some new considerations particular to machine-readable records. Technical analysis, on the other hand, is a relatively new activity in the appraisal of records, but one which is of the utmost importance in the evaluation of machine-readable records.

6.10 Machine-readable records may have evidential value if they contribute to the policies or decisions adopted by a department or agency, or if they provide documentation of significant operations or procedures. Examples of machine-readable records which may have evidential value are provided in that section of the study which deals with the application of content analysis to individual categories of information (3.18 to 3.66).

6.11 Archivists must also consider the legal value of the machine-readable records which they are appraising. There are at least three factors which could affect the assessment for legal value. The first is whether or not such records are admissible as evidence in a court of law (3.5). The second factor is the association of the records with copyright law, both nationally and eventually internationally. Of particular importance are any special provisions to cover computer programs or software (3.6).

The third factor is the existence of any acts which stipulate that certain kinds of records must be retained for certain periods of time to meet particular legal requirements. This is especially the ease when such acts include machine-readable data with their supporting documentation in the definition of "records" which must be retained for certain periods of time (3.7).

6.12 Another legal factor with which archivists must be familiar when appraising machine-readable records is the existence of any legislation which prevents the "export" of machine-readable records, usually containing personal information, from the country in which the records were created. This is how some countries have responded to the impact of electronically communicated transborder data flows.

There are several other sovereignty-related issues associated with the transborder data flow question of which archivists should also be aware (5.19 to 5.23).

6.13 The main appraisal judgment in terms of content analysis is the value of the information the records contain for uses other than those for which they were created. The determination of informational value of machine-readable records is similar to the evaluation of other types of information for potential research value - an evaluation of the significance of the subject content for current and future research. However, there are a number of factors unique to machine-readable records which must be considered in appraising such records for their informational value. One of these is the uniqueness of the information or its format (1.46 to 1.49, and 3.9 to 3.11). A second factor is the potential for record linkage (3.13). Another important factor to consider is the level of aggregation (3.12 and 5.12 to 5.18).

6.14 The content analysis must be performed in consultation with departmental users, data processors, and other individuals connected with the information described in the file. It is important to keep in mind that the content analysis cannot be undertaken without the archivist having first obtained detailed information on the organization, the information structure of the organisation, the purpose of the machine-readable data file, the methodology used, its use in the specific programme, its relationship to other programmes in the organisation, and even its value in terms of the user's own perception of its worth to both the organisation and potential research communities.

6.15 Before an assessment can be made on the historical or long-term research value of machine-readable records, it must first be determined if the information on the computer tape, punched cards floppy disks, etc. can be read (4.4 to 4.6). It must also be determined if there is sufficient documentation accompanying the machine-readable records, consisting at least of a record layout and a codebook, to appraise and process the records and sufficient information for a researcher to use the records (4.7 and 4.8). If the data can be read and there is adequate documentation, then the archivist can proceed to the analysis of the contents of the machine-readable records and a more detailed technical analysis of the arrangement of the records and problems which could occur due to long-term storage.

6.16 In undertaking the detailed technical analysis, a number of factors must be taken into consideration. One of these is the size of the machine-readable data file. Should the size of the file pose difficulties, then the archivist might have to consider the possibility of obtaining only a sample of the records. In undertaking this, the archivist will have to determine the effect sampling might have on the informational value. It is important to keep in mind that sampling is not a substitute for appraisal. It is merely a very powerful tool at the disposal of the archivist in implementing an appraisal decision (5.24 to 5.35).

6.17 Another factor which must be addressed when undertaking the detailed technical analysis is the internal arrangement of the data. The arrangement of the individual records on the reel of tape is rarely a major consideration, but the character codes used and the dependence on certain computer programs could have a major impact on the processing of the data (4.11 and 4.12).

6.18 The major consideration when undertaking the technical analysis is the hardware dependency of various storage media and the software dependency of certain formats of information. In both cases archivists must be aware of the costs associated with reformatting the data should this be required (4.13 to 4.16).

6.19 It must also be remembered that any machine readable records which are acquired must also be preserved. During the technical analysis the archivist should determine, if at all possible, the costs which will be required to preserve the data for a long period of time (4.17).

6.20 Two additional factors should also be considered. The archivist will need to determine who will fill service requests on the data _ whether the originating department or the archival repository. Should the data file be software and hardware dependent, it might be decided for the originating institution to handle all service requests. The nature of any restrictions on the data must also be determined. While the same kind of restrictions apply to machine-readable records as to textual records, the manner in which such restrictions are handled is different. If only certain portions of information are restricted, it is possible to remove all other portions of information from the file for research use, thereby creating a public use version of a restricted file. However, it is important for the archivist to consider the impact such measures would have on the informational value, as well as the cost of producing a public use version (4.18 and 4.19, as well as 5.5 to 5.18).

6.21 The analysis of the technical considerations of a machine-readable data file should lead to a more rational development of an approach to the acquisition, processing, conservation, and servicing of the data. The approach itself should be developed according to the willingness of an archival repository to absorb the costs associated with each of these archival functions. In order to assist in the evaluation of such technical attributes as software, hardware, size, and physical arrangement, and in order to provide a more systematic analysis of the archival functions, archivists may wish to use question-and-answer planning tools which can be developed for different kinds or types of data (4.35, 4.40, and 4.53).

6.22 It is at this stage that the archivist should bring together the results of the content analysis and technical analysis and justify the decision to acquire the records or to reject the records. Should the appraisal decision be favourable, it is suggested that a plan of action be developed, using information contained in the planning tools referred to in paragraph 6.21 above. Such an action plan could cover the acquisition, processing, conservation, and servicing functions (4.58 to 4.66).

6.23 It is possible that, because of the substantial costs of long-term preservation which includes the conversion of the data to current formats so as to prevent technological obsolescence, not all machine-readable data files acquired by an archival repository will be retained forever. In order to determine which data files should be maintained and for how long, archival administrators might wish to consider the establishment of a reappraisal policy. As a practical way to implement such a policy, upon acquisition by an archival repository all machine-readable data files could be issued a review data (5.35 to 5.40).


Contents - Previous - Next