Electronic publications cover the rapidly increasing area of publications that require a computer to be used to access the information that they contain. They can be documents distributed free of charge or obtained by purchase. They are supplied in two forms - Off-line publications and On-line publications. Some electronic publications are not supplied on physical carriers and need to be copied into the libraries' access system and be stored on hard disc stacks, tape streamers or other data storage systems; others are supplied on physical carriers and can be stored on shelves. This chapter will, therefore, be looking not at the physical carriers - they have been covered in the preceding chapters - but at the specific problems of acquiring, selecting, storing and accessing this group of documents.
Definition and Typology of Electronic Publications
An off-line publication is an electronic document which is bibliographically identifiable, which is stored in machine readable form on an electronic storage medium. CD-ROM, diskettes or floppy discs and magnetic tapes are examples.
An on-line publication (or resource) is an electronic document which is bibliographically identifiable, which is stored in machine readable form on an electronic storage medium and which is available on-line. For example - an electronic journal, a World Wide Web page or an on-line database.
The producers and publishers of electronic publications can be traditional publishers who expand into new areas of publishing. It can also be newly established content providers, especially in case of the new publications on the World Wide Web, who only offer on-line electronic publishing. In addition, some companies specialize in CD-ROM publishing.
Nowadays, most publications are written, edited and formatted using word processors and desk-top-publishing software. The printed version of the journal or the monograph is derived from the electronic form.
Distinction Between Audiovisual Material and an Electronic Publication
Multimedia publications are now produced which contain a mixture of material e.g. a biography, a bibliography, stills (photos), animation, video and sound. It sometimes becomes difficult to distinguish between an audiovisual document and an electronic publication related to text. For example, a movie with subtitling is audiovisual - a CD of Michael Jackson with a video clip consisting of moving images is considered to be an audio CD. A CD-ROM which contains a biography, a bibliography, texts of the songs, some sound, video and photos is considered to be a multimedia CD-ROM publication.
In short, an electronic publication must contain a considerable amount of text before a library will take it on deposit. Some libraries also take audiovisual publications into deposit. e.g. Die Deutsche Bibliothek in Frankfurt am Main in Germany.
Electronic Documents or Virtual Information
The term Electronic Documents or, as they are sometimes called, Virtual Information refers to the modern methods of transmitting documents between individuals, primarily text-based documents - the equivalents of letters and memoranda - by electronic means ie. without the use of paper. Many of the actual and potential problems created by electronic documents are similar to those created by electronic publications.
The documents, while stored on a physical carrier somewhere and easily accessible to a small group of people including the author, are, nethertheless, difficult for an archivist to obtain access to and preserve. The documents include E-Mail messages and computer files held on personal computers. When electronic documents are stored, it is on physical carriers used by other types of documents. The main factor that differentiates electronic documents from other documents is the method of transmission.
The first, and major, problem in the preservation of electronic documents is to gain access to them and discover what exists. This can only be done with the active support of the institution and its staff. If the institution has a PC network, the problem of access can be eased.
Since many of the E-Mail messages between staff are likely to be trivial and, perhaps, somewhat embarrassing if read by others than the author and the intended recipients, it is essential to ensure that everyone is aware that the archive will be periodically reviewing both formal files and messages held in the central file server to select material that is worth preserving.
Once access is gained, the material can be subject to standard selection criteria and the chosen information copied into the archive's data storage system. The long term preservation of the information can then be part of the archive's strategy for documents in general.
What is Involved in Acquiring Electronic Documents and Publications?
Research is being carried out by many archives and libraries into the best methods to give access to electronic materials in the very long term. Because of the sheer quantity of material being produced, particularly for access via the World Wide Web, selection is essential. Many archives and libraries use the existing selection criteria for printed materials for electronic materials as well. The contents of the document are the relevant factors for selection and not the medium. This means that the physical carrier, the hardware and the software used are not relevant for the selection process. Local policy defines the criteria for selection e.g. in Germany audiovisual material is included in the national bibliography, in some other countries it is not.
Acquisition and Registration
Off-line publications can often come to the library as printed publications. Obviously, when the library starts collecting off-line publications, the publishers have to be notified. In the Netherlands, where deposit is done on voluntary basis, it is important that the publishers are kept informed about the new selection criteria. In France, the law defines what publications are to be submitted.
On-line publications require a new form of co-operation. The publication has to be transmitted from the host system to the library via the network. Selected documents are either ordered, transferred automatically by the publisher or harvested by the library with a harvester application. For on-line documents, acquisition means the physical migration (via the network) of the document from the host-system to the depository system. The publisher/producer or administrator (for archives) needs to be involved in this process.
It is necessary to register documents when they are received by the library. This requires the exchange of bibliographic information (pre-publishing information) between the depository library and publisher (for archives this will be between the governmental institution and the archive), preferably before acquisition. The registration of incoming documents should be activated on arrival.
It is necessary to install the electronic publication so it can be viewed and described by the librarian. For on-line documents, a connection to the host-system is required; off-line documents have to be physically installed on a workstation.
Description of the Document
Cataloguing systems for electronic documents are still the subject of much debate. Various groups are discussing how to describe an electronic document. The existing book-based systems such as MARC and its variants do not fully describe these new formats. For example, to be able to view an electronic publication it is also necessary to describe the technical features - which computer and operating system was the publication made for? which formats are used? etc. Many fields for the technical description will be made in coded form.
Electronic publications offer an opportunity to automate part of the production of a catalogue. Bibliographic data can be retrieved from the electronic publication itself, e.g. from the table of contents (TOC). A research project of the European Commission, BIBLINK, is studying how data can be exchanged between publisher and library in an automated way. The Dublin Core defines the fields that are necessary to support adequate bibliographic description of a Web page. The Dublin Core has received significant support, particularly from North America and including some publishers. A threat that may ultimately make it unacceptable, is that the Dublin Core contains too many features requiring definition at the national level or that require a large maintenance overhead.
In the international book trade, the unique identification numbers ISBN (International Standard Book Number) and ISSN (International Standard Serial Number) are widely used to uniquely identify a certain version of a monograph or serial publication. ISBN and ISSN are also used for CD-ROMs and on-line publications like electronic journals. However, these numbers are not designed for electronic publications and a proposal was, therefore, made for a Digital Object Identifier (DOI). The DOI is designed by Association of American Publishers and the Corporation for National Research Initiatives.
Authenticity and Integrity
Some electronic publications can easily be changed. What guarantee is there that the bibliographic description defines exactly the version which is stored? And will it still do so after the lapse of several hundred years and the migration to other carriers and formats. This is still a very tricky area. Several methods are being considered, e.g. time stamps, encryptions and watermarks. But it must be said that the final solution for this issue has yet to be found.
After the bibliographical and technical description the electronic publication must be removed from the hard disk on the computer and an on-line session must be closed. This activity has generated new information which should be included to the descriptive record.
Migration, Storage, Conversion and Emulation
Other factors that have to be considered when collecting electronic documents include the following:
Migration - Migration of the electronic content from the original carrier to the physical storage of the depository system, including migration quality control and duplication for backup (preferably on another medium).
Storage - The physical storage system will probably use different types of media with different access speeds, e.g. hard disc (very fast), magneto-optical (fast), tape (slow). This requires sophisticated software to monitor the use of documents and to shift documents from tape to discs and vice versa.
Pathfinder - This is a storage records the physical locations of all the files in a document and makes the file map available to the search engine.
Conversion and Emulation - Do you have to convert the format of the document to a new format, or do you have to design a system in which the document is stored in the original format? Emulation software enables the document stored in the original format to be viewed using the new hardware and software.
These techniques are concerned with preservation and final solutions have not yet been found. Increasing speed of technological innovation, new publishing techniques, InterNet and the present lack of standards are a few examples of the uncertainties in which the manager of a depository system must work. There is no proven solution for these systems, large vendors have build systems for data-warehousing and data-mining, although they still lack structured indexing and large scale preservation solutions needed by libraries and archives
Long Term Availability and Access for End Users: Remote or On Site
Indexing - Descriptive information is indexed for use within the search engine of the depository system. This engine can be part of the pathfinder software or can be a separate existing library system's OPAC module, to be defined locally. To find the right compromise between (the user's) indexing requirements and the technical possibilities is very complicated.
Access - Access to electronic publications by end users must be clearly defined. At present, most access is "on-site" but, when agreements are made with the owners of the information, remote access may be possible.
As with the deposit for printed publications, electronic deposit collections should be used as "collections of last resort". Libraries can, however, give access when agreements are reached with publishers and authors.
Copyright Issues, Authors and Publishers
It is obvious that it is very important that the digital archives and libraries discuss restrictions on access and availability with publishers and authors when this is appropriate.
Usage of Standards
There are many relevant standards for electronic publications. The European Commission has launched an initiative, OII (Open Information Interchange), as part of the IMPACT2 programme. The aim of the OII initiative is to promote the awareness and use of standards for the exchange of information in electronic form. The target audience are developers and providers of information products and services, as well as end-users. Standards can be purchased from international standard offices and many countries have an organisation which translates and distributes the standards. For more information visit the Commission's Web site where copies of publications on standards can be found (http://www.echo.lu/oii/).
For the preservation of electronic publications a variety of standards are relevant. These include standards on hardware, operating systems (Windows, MS-DOS, UNIX), physical carriers
(CD-ROM, WORM, DAT, diskettes, magnetic tapes), application programs like wordprocessors, databases, spread sheets and formats like MARC, SGML, HTML etcetera.
Availability of Electronic Publications on the Market
Printed publications like monographs and serials are no longer available on the market permanently. After a relatively short time, a specific edition of a monograph can be difficult to find in a book shop. It may be possible to order from a large distributor or even the publisher. With off-line electronic publications it is exactly the same. The publishers are no longer interested in keeping publications available when there is no commercial interest in the products. This may be understandable from the market point of view but is still unfortunate. In addition, publishers often do not have a full archive of their own publications. It is very important, therefore, that as soon as possible after the publication date a document should be selected, described and made available (at least for review on site) by a public body like a national archive or a national library.
Also see the Chapters on Magnetic and Optical Carriers
For detailed information about Digital Object Identifiers (DOI), refer to the World Wide Web (http://www.doi.org/ ).
For more information about the Dublin Core refer to the World Wide Web (http://www.oclc.org:5046/oclc/research/conferences/metadata/dublin_core_report.html)
International Council on Archives: Committee on Electronic Records
Guide for Managing Electronic records from an Archival Perspective
Ackerman, M. S., and R. T. Fielding (1995)
Collection Maintenance in the Digital Library
(In Proceedings of Digital Libraries, June 95, pp. 39-48, Austin, Texas)
Also available at [URL:http://csdl.tamu.edu/DL95].
Bearman, David, and Margaret Hedstrom (1993)
Reinventing Archives for Electronic Records: Alternative Service Delivery Options.
In Margaret Hedstrom, ed. Electronic Records
Conway, Paul (1994)
Digitizing Preservation: Paper and Microfilm Go Electronic.
Library Journal 119 (February 1): 42-45.
Conway, Paul (1996)
Selecting Microfilm for Digital Preservation: A Case Study from Project Open Book.
Library Resources and Technical Services 40(1): 67-77.
Conway, Paul and Shari Weaver (1994)
The Setup Phase of Project Open Book.
Washington, D.C.: Commission on Preservation and Access.
Davis, Stephen P. (1995)
Digital Image Collections: Cataloging Data Model and Network Access.
In: Patricia A. McClung, ed. RLG Digital Image Access Project: Proceedings From an RLG Symposium Held March 31 and April 1, 1995, Palo Alto, California.
Palo Alto, CA: Research Libraries Group,. pp. 45-59.
Also available at [URL:http://www.columbia.edu/cu/libraries/inside/projects/diap/paper.html].
A Study of Issues Faced by National Libraries in the Field of Deposit Collections of Electronic Publications. Report of the Workshop held in Luxembourg, December 18, 1995.
Luxembourg: European Commission, Directorate General XIII-E/4, February.
Graham, Peter S. (1994)
Intellectual Preservation: Electronic Preservation of the Third Kind.
Washington, D.C.: Commission on Preservation and Access.
Graham, Peter S. (1995a)
Requirements for the Digital Research Library.
College and Research Libraries, July, 56(4): 331-339.
Hedstrom, Margaret and Alan Kowlowitz (1988)
Meeting the Challenge of Machine Readable Records: A State Archives Perspective.
Reference Studies Review 16(1-2): 31-40.
Hedstrom, Margaret (1991)
Understanding Electronic Incunabula: A Framework for Research on Electronic Records.
American Archivist 54 (3): 334-354.
Hedstrom, Margaret (1995)
Electronic Archives: Integrity and Access in the Network Environment.
In Stephanie Kenna and Seamus Ross, eds. Networking in the Humanities: Proceedings of the Second Conference on Scholarship and Technology in the Humanities, held at Elvetham Hall, Hampshire, UK,
13-16 April, 1994. London:Bowker-Saur, pp. 77-95.
Herbst, Axel and Bernhard Malle (1995)
Electronic Archiving in the Light of Product Liability.
In Proceedings of Know Right 95. Vienna: Oldenbourg Verlag, pp. 455-460.
Lesk, Michael (1992)
Preservation of New Technology: A Report of the Technology Assessment Advisory Committee to the Commission on Preservation and Access.
Washington, D.C.: Commission on Preservation and Access.
Levy, David M. and Catherine C. Marshall (1995)
Going Digital: A Look at Assumptions Underlying Digital Libraries.
Communications of the ACM 38(4): 77-83.
Lyall, Jan 1996
Draft Statement of Principles for the Preservation of and Long Term Access to Australian Digital Objects. Canberra: National Library of Australia.
Lynch, Clifford (1994a)
The Integrity of Digital Information: Mechanics and Definitional Issues.
Journal of the American Society for Information Science 45(10): 737-744.
Lynch, Clifford (1994b)
Uniform Resource Naming: From Standards to Operational Systems.
Serials Review 20 (4): 9-14.
Lynch, Clifford (1996)
Integrity Issues in Electronic Publishing.
In Robin P. Peek and Gregory B. Newby, eds., Scholarly Publishing: The Electronic Frontier, Cambridge: The MIT Press, pp. 133-145.
Mallinson, John C. (1986)
Preserving Machine-Readable Archival Records for the Millennia.
Archivaria 22(Summer): 147-52.
Mohlhenrich, Janice, ed. (1993)
Preservation of Electronic Formats: Electronic Formats for Preservation.
Fort Atkinson, Wis.: Highsmith.
Study on the Long-Term Retention of Selected Scientific and Technical Records of the Federal Government: Working Papers.
National Research Council (1995b) Washington, D.C.: National Academy Press.
O'Toole, James M. (1989)
On the Idea of Permanence.
American Archivist 52(1): 10-25.
Owen, J. S. MacKenzie and J. van de Walle (1995)
ELDEP Project: A study of issues faced by national libraries in the field of deposit collections of electronic publications.
Background Document for the ELDEP Workshop, Luxembourg, December 18, 1995. The Hague: NBBI.
Rothenberg, Jeff (1995)
Ensuring the Longevity of Digital Documents.
Scientific American, 272 (January): 42-47.
Levels of Access and Use of Computers: 1984, 1989, 1993. Current Population Survey Reports.
United States Census Bureau (1993) Population Division, Education and Social Stratification Branch: [URL:http://www.census.gov/ftp/pub/population/socdemo/computer/compusea.txt]
Research Issues in Electronic Records.
United States National Historical Publications and Research Commission (1991)
St. Paul: Minnesota Historical Society.
Waters, Donald J. (1994)
Transforming Libraries Through Digital Preservation.
In Nancy E. Elkington, ed. Digital Imaging Technology for Preservation: Proceedings from an RLG Symposium held March 17 & 18, 1994. Mountain View, CA: Research Libraries Group, pp. 115-127
Waters, Donald J. (1996a)
Realizing Benefits from Inter-Institutional Agreements: The Implications of the Draft Report of the Task Force on Archiving of Digital Information.
The Commission on Preservation and Access, Washington, D.C
Also available at: [URL:http://arl.cni.org/arl/proceedings/127/waters.html].
Waters, Donald J. (1996b)
Archiving Digital Information.
A presentation to the OCLC Research Library Directors Conference, Dublin, Ohio, March 12, 1996.
Weibel, Stuart (1995)
Metadata: The Foundations of Resource Description.
D-Lib Magazine (July).
Also available at: [URL:http://www.dlib.org/dlib/].
Wiederhold, Gio (1995)
Digital Libraries, Value and Productivity.
Communications of the ACM 38(4): 85-96.