6. Conservation and research use
6.0 Technology will have a major impact on the way records are stored and disseminated. Technology will not only change the way electronic records are conserved and disseminated but will also effect the way in which other archival records are conserved and disseminated.
6.1 Conservation practices for electronic records.
6.1.1 The fragility of the storage medium for electronic records has been a major concern for any archives which has or is planning to have an electronic records program. To date the most acceptable medium has been magnetic tape, as it has proved to be the most stable and the least costly of the magnetic media. The conservation of electronic records does require major expenditures both in human and financial resources. Large amounts of data can be stored on magnetic tape and the storage capacity is increasing (a reel of magnetic tape recorded at 6250 bits per inch could contain the equivalent of 112,500,000 characters of information); however, the care and handling of the medium requires that certain procedures be followed. The tapes must be stored in an environmentally controlled site; the tapes must be rewound periodically to guard against such problems as coupling; the life span of the tape is approximately ten years thus necessitating recopying of the data onto new tapes periodically. During the recopying process it is necessary to ensure that the data are recopied using new technical specifications. Tape drives able to read data recorded on magnetic tape at a certain number of bits per inch, or bpi, are now difficult to find. Whether or not the medium is still in good condition is of little importance as the technical specifications at which the data are written may no longer be read. The major focus of a conservation program for electronic records has been an active process where staff continually monitor and upgrade the medium. Such practices are quite different from the storage of hardcopy records.
The lack of compatibility between computer systems and the lack of technical standards have created the need to monitor technical change in an active way.
The new optical storage media provide for some major breakthroughs for the conservation of electronic records. Not only is the storage capacity of the medium increased dramatically but optical discs do not require environmentally controlled storage sites. New advances in optical technology have resulted the creation of a new flexible optical tape. A single twelve inch optical tape reel stores the equivalent of 5,000 magnetic tapes or one terabyte of data. The most important impact is that of data transfer speed rates which are higher than that of the magnetic media. Such advances make the physical conservation of electronic records more affordable.
Despite the advances made in the storage technology, one major problem still to be overcome is that of technical specification changes which require repositories to recopy records to new technical standards. The microfilm industry has established standards, and although problems can exist with the quality of the film, the technical standards to read the film do not change. The lack of standards in the computer industry continues to present problems for the long term preservation of electronic records.
The recent interest in the development of international standards for data transfer is therefore extremely important from an archival perspective. The issue of standards can be found in every archival function: appraisal, processing, conservation and dissemination. It is why it is of vital importance for archives to be represented as major users in the development of information technology standards. The development and application of standards is a long term project but one that will alleviate the complex problems relating to conservation of electronic records of archival value.
6.1.2 New approaches to conservation
The amount of electronic data is increasing rapidly and extends into all sectors. Regardless of the increased storage capacities of various media and the development of standards, archives may be faced with the prospect of being unable to conserve the data of archival value. The complexity of the information systems being created, particularly in the scientific field, may prove to be too difficult for one institution to handle. A change in the present practices may be necessary in order to resolve the problem.
The topic of new approaches to conservation could be discussed under any of the archival functions for electronic records as the interrelationship of the functions becomes stronger when dealing with electronic records than with other records. Just as there is a convergence of technology to produce the compound document, there is also a convergence of archival functions to conserve the electronic document. The technical considerations and implications must be considered at each step of the process be it appraisal, arrangement or dissemination. The advantage is that all of the technical considerations revolve around the medium regardless of the function. To resolve these issues will take resources and technical expertise which do not presently exist in most repositories. The approach which might be taken is to use the expertise of the archivist in his or her ability to appraise information but use the source of the information as the physical custodian of the records. Such a change in approach would mean that archives use their resources to identify (through appraisal) the electronic records of value; to establish standards for the long term preservation of the records; to monitor the development of new systems; to develop extensive finding aids for the records; and to disseminate information about the records to researchers. The physical acquisition of the records would never occur; the creating agency would be charged with the responsibility for physical preservation of the records.
This approach has certain drawbacks, particularly as there is a loss of physical custody of the records. However, with the development of improved communication systems, and the transfer of data on-line to researchers, it may provide for the intellectual control without the necessity of having the technical expertise on site for the physical control and conservation of the records as well as the expense of preservation.
Such changes to existing practices are workable not only because of the improved communication networks but more importantly because of the lack of importance attached to the notion of original records. In electronic records the concept of an original record does not exist; if fact, a copy is more likely to be of better quality than the original. The information may be unique but having the original version in the possession of an archival repository is not. Archives may find it more advantageous to develop and concentrate on appraisal and intellectual control rather than on the physical custody of the records.
6.1.3. Use of electronic records
To date the users of electronic records have not been the traditional users of archives. Most of the collections of electronic records held by those archives with programs have been statistical in nature. As more textual type documents, case files and other more traditional archival material are acquired in electronic form, the number of users will increase. The analysis of machine readable records has required techniques which, to date, are found more frequently among social scientists and the scientific community. A new generation of archival researchers, literate in the use of computer technology, will make more demands for material in electronic form.
An interesting phenomenon has been the growth in the numbers of researchers automating existing textual records. Researchers are using archival records as source documents for computerization. Many examples exist but the most frequent are the use of birth, marriage and death records for demographic historical studies; shipping records; and census records. The computerization of such records requires a report in itself, as interesting techniques must be used to codify information which was never intended to be codified. The trend is an interesting one in that it will produce a pool of information which, when used with today's sources, will provide for analysis of change over long periods of time.
The use of optical disc as a storage medium will also affect the use of archival records in that the ability to store, as well as enhance, historical textual documents will allow researchers to access vaste amounts of information. From an intellectual point of view, this trend will require archives to develop more standardized and complete finding aids to records. Already, major efforts have been made to convert textual holdings to optical disc. The Library of Congress is in the process of implementing a plan to offer automated public access to thousands of its rare historical manuscripts, photographs and cartoon prints. Technology will have a major impact on the research use of both electronic records and traditional records. It will permit the combination of these records into one media based form: electronic. Examples of such products which provide for new ways of distributing information can be found in the 1986 British Broadcasting Corporation's Domesday Project, undertaken to mark the 900th anniversary of the compilation of William the Conqueror's Domesday Books. The project used advanced interactive video technology to present a detailed contemporary portrait of the United Kingdom. The Domesday Project contains over 54,000 photographs, 24,000 maps, 10,000 datasets and moving images with sound.
A similar project in Canada, the Jean Talon Project, will result in the development of an electronic product which will include images, data, sound and text on topics, issues and themes of importance to the evolution of Canada in the twentieth century. The resulting product will be an electronic interactive library, exhibition and archive. The Jean Talon Project will depend upon cooperation at the federal-provincial level, and the use of new technologies for access to the information.
6.1.4 Access to the records
Following the discussion of the use of technology in research is the access to the records which computer technology will provide. For electronic records the ability to disseminate records directly to the user is already a routine practice. The ease of that dissemination is what is likely to change over the next few years. Until recently most researchers required access to mainframe computers and a certain amount of programming experience to manipulate the information. The increased friendliness of much of the software and the overwhelming increase in the power of the microcomputer will allow far more researchers to use material outside of the archives. Researchers will have the ability to search on-line, request copies of records and manipulate these records without ever having to leave their homes. This will necessitate a change in the way archives provide researcher services.
An issue of concern in the information society is the right of access to information and the privacy rights of individuals. In many countries freedom of information and privacy laws exist to ensure that access to documents is a right but that the privacy of the individual is protected. Electronic records fall under the rulings of access and privacy legislation.
Techniques have been developed to ensure that electronic records containing personal information are not released. The process is referred to as anonymization and requires that the contents of the records be reviewed to determine if personal information exists and if the information is identifiable. Electronic records often contain personal information usually of three broad types: demographic information such as age, sex; socio-economic information such as income, occupation or education;; and attitudes and opinions. The latter, without the previous characteristics attached, is not usually unique. The main focus of anonymization is related to the demographic and socio-economic information which is unique and can lead to disclosure of personal information. Every data file requires an evaluation of both the contents and the size of the file.
Three major types of identification can be made with the alphanumeric type of data file. A single characteristic such as name, address or social security number. This information usually exists outside the context of the data file and can be verified. The second type is a combination of characteristics which could lead to specific identification of an individual. In this type of situation the size of the total population of the records is important.
If one collected information on doctors in a small city, it is quite possible that one respondent with a specific salary level, educated in a particular university, could be the only one in the total population of doctors. He or she would be identifiable. If certain characteristics were combined, (i.e. salary levels), providing for a larger range of salaries, it is possible that the one doctor would no longer be identifiable as more cases would be found in that salary range. The third type of identification is similar to the previous one but involves a combination of characteristics leading to the identification of a limited number of qualified cases. Although not as obvious as the single unique identifier, it can prove equally as damaging.
Two major approaches to anonymization are the deletion of information or the aggregation of information. The evaluation of the effect that the two approaches will have on the overall utility of the records for further analysis must be undertaken as they result in a reduction of the utility of the data for secondary analysis either through an absolute loss of information (deletion) or a reduction in the degree of specificity in the data (aggregation).
The intellectual process of anonymization requires considerable technical experience; the physical creation of the anonymized version is technically far less time consuming than the review of hardcopy records. Anonymization of electronic records will become an important function of archival activities as the possibilities of linking electronic records is made greater than with hardcopy records. The anonymization of the records can be done in-house if the records are physically acquired or by the agency which holds the records. In either case consultation with the creator of the records is required.
Matching of the records can be done extremely rapidly through the use of computers, if the right characteristics are there. Archives must focus on this issue and develop expertise in ensuring the non-release of personal information if they are to disseminate electronic records.