"The year is 2045, and my grandchildren are exploring the attic of my house. They find a letter dated 1995 and a CD-ROM. The letter says the disk contains a document that provides the key to obtaining my fortune. My grandchildren are understandably excited, but they have never seen a CD - except in old movies. Even if they can find a suitable disk drive, how will they run the software necessary to interpret what is on the disk? How can they read my obsolete digital document?"
That quotation is from an article in Scientific American in 1995. We were then living a total revolution, discovering the internet and email, and everyone was taking bets on how long it would take for paper to disappear. For centuries man had had but one single media to convey information, which was paper, and all of a sudden, within the space of a few years, a whole set of new technologies invaded the world under the umbrella terminology of digital information. Such a revolution has major consequences in terms of access to information, and in processing and preserving documents, and raises problems that are far beyond technical skills or management strategies.
As the twentieth century draws to a close, an ever-increasing amount of information is created, disseminated and accessed in digital form. This article attempts to present some of the issues surrounding the challenge of digital preservation, and in particular highlights current activity being undertaken by some of the major players in the field. It is clear that some good progress has been made in developing guidelines and best practice for the preservation of digital documents, both nationally and possibly internationally too. However, there is still much anxiety and uncertainty over the best way to proceed in some key areas, and these particular issues are explored here too.
The emergence of digital technologies in the library and archival worlds has changed many practices in the profession, and in recent years many major libraries have been collecting or producing digital documents: even in developing countries, librarians dream of turning digital, leapfrogging other tried and tested technologies such as microfilming. It cannot be disputed that digital technology has accomplished a great step towards better and easier access to information; the same piece of information can be accessed by several readers simultaneously, regardless of where they are in the world, and far more speedily than previously. The Internet of course allows millions of people around the world to receive the same information at the same time. Distance, frontiers and time limits have all vanished: it could be said that the only requirements for access to information now are language and technical equipment or connections.
The opportunity to browse from one subject to another, from one website to another, and to automate the tedious aspects of seeking information has revolutionised research. Thanks to digitisation, a student can now scan a complete collection of Shakespeare’s dramas in a matter of minutes, something which would have taken days before the advent of digitisation when such a search would have involved laborious page by page research. Libraries also appreciate the space-saving advantages offered by digital collections: the Encyclopedia Britannica, on one or two CD-ROMs, is certainly less cumbersome than the print version, and if correctly handled those CD-ROMs will not need repair or restoration like ordinary paper books which are constantly used and whose pages or bindings tend to tear.
Is digital technology, then, a panacea? The answer of course must be no, or at least only partly so. The limitations of digitisation for long-term access to information has already been acknowledged, and it is well known that most of the data generated by NASA thirty years ago when Armstrong first walked on the moon has been lost, unreadable now because so little consideration had been given at the time to its preservation.
The threat of obsolescence to digital information is twofold, since there is a risk of obsolescence to both the hardware and the software. What increases that threat is the speed with which technology is changing. It is almost impossible to retain outdated computers or disk drives compatible with certain outdated diskettes or CD-ROMs, and even if this was achieved, who in thirty or fifty years time, would be able to repair them when they break down? Maintaining the hardware would not be enough if we are no longer capable of using the software, or, worse, if we no longer know what software has been used.
Another danger which threatens digital technology is cost. The preservation of digital material is a continual process, and to the initial cost of digitising the material must be added additional costs for migrating data every five or ten years, if not more often. Too few professionals are still unaware of the economic burden of digital preservation in the overall management of their library. That is one reason why IFLA and very many other organisations and institutions are trying to raise awareness of the issues surrounding the preservation of digital materials.
There are other, more intellectual and ethical issues too in the use of computers to generate literary works. As a visit to the manuscript department of any of the great national libraries of the world will testify, the hand-written manuscript can reveal much more about the life and state of mind of the writer than any electronic document can ever do. Marcel Proust’s "paperoles", the small pieces of paper which his servant wrote under dictation because he was too ill to write himself, contain many handwritten corrections in the margins, and are of major importance for all those who study the genesis of Proust’s literary creation. Victor Hugo’s splendid handwriting and the amazing and powerful drawings he used to draw in the margins of the pale blue paper he favoured, are similarly full of historical significance. How can the successive versions of a novel for example, or the progression or changes in an author’s thoughts, be studied in future, when the only permanent record may be a diskette containing the final version. No draft, no hesitation, no drawings or doodles. No doubt either that those who will study literary history or the genesis of a book will be at a loss.
The same is true of email. Although it is sometimes difficult to imagine life before the arrival of email, there is cause also to regret the transitory nature of email. A century ago, famous writers may have recorded their movements, thoughts and emotions in letters to friends or family, and these have often been preserved as part of our cultural heritage, helping to set literary works in the context of the writer’s life and thought. In facilitating access to information and in reducing the time for information to pass from one place to another, email has made information transitory and non-essential: in doing so, it contributes to the loss of our cultural memory.
It is widely accepted that traditional printed documents, particularly when they contribute to a nation’s cultural heritage, should be preserved to ensure long-term access and availability for future generations. Best practice in the preservation and conservation of traditional materials - not only literary materials, but photographs, manuscripts and artistic works too - is already well-established, with organisations such as the UK’s National Preservation Office (NPO) playing a strong role in ensuring high standards in this area.
The need to preserve digital documents is of equal importance, and this essential work is now beginning to be taken seriously. Electronic documents are often considered as two distinct groups: digitised copies of original printed or written documents, and works which have no print original, often called born-digital works. The preservation policies concerning the two groups may be different, especially where the original document which has been digitised is also being preserved. On the other hand, born-digital works may also require special preservation measures as they are unique.
The last few years have seen the exponential growth in the number of electronic documents of all kinds. In the traditional arena of printed material, it is obvious for the institutions in charge of collecting and preserving the nation’s memory that not everything can be preserved, and that a selection process is necessary and unavoidable. The enormous amount of digital information which exists, and the ease with which it can be created or changed makes selection criteria even more essential, but in a way even more difficult. What should those selection criteria be? Can we be sure that what is selected for preservation now will be what is required in the future? Would this selection activity influence, if not dictate, the main areas of research for future generations? In the case of continually updated documents, for example online or web-based publications, should all versions of the same document be preserved, or only the final version? What about links to other web sites? The exhilaration which grips us when we surf the Net, quickly turns to vertigo when we begin to consider the preservation of that information.
One thing is certain: no matter how important ethical issues and selection criteria may be, managerial issues will probably greatly influence the selection. Migration of information is one of the preservation measures currently advocated to preserve electronic publications, but it raises technical challenges, together with problems of staff resources and financial implications.
The life cycle of digital material
The concept of the life-cycle of digital material was developed in a recent key project, and is rapidly becoming accepted as an efficient and useful way in which to explore the challenges associated with its preservation. One of the JISC/NPO Studies on the Preservation of Electronic Material, guided by a specially established committee, the Digital Archiving Working Group, this particular study aimed to develop a strategic policy framework for creating and preserving digital material". The life-cycle which emerges is broken down into data creation; collection management and preservation; acquisition, retention and disposal; data management; and data use. The study presents the view that the life-cycle concept is essential because it makes it clear that different stake-holders have different interests at different stages of the cycle. What is crucial is that the issue of preservation must be taken into account at all stages, and not just towards the end of the cycle, since the preservation process needs to be considered from the beginning. Raising awareness among all stakeholders of the importance of preservation is one of the key messages coming from the study, as is the need for cooperation between all of the major players.
The resulting framework which has been developed provides strategic guidance to stakeholders at all stages in the life cycle. In implementing the framework, stakeholders are recommended to assess the issues as they relate to their particular stage in the cycle, but also to consider how the various stages are interrelated, and to be aware of the effects of the decisions of one group on the other stakeholders.
This article is not concerned primarily with the technology challenges and problems of digital preservation, but it is useful to mention a couple of key reports and developments which have occurred recently. One of the main areas of debate is what exactly should be preserved. Should the aim be to preserve the content of the digital document, or the physical container? If content, then should an attempt be made to retain the same look and feel as the original, or simply to preserve the data with little regard to the physical container?
The summary report on the JISC/NPO Studies on the Preservation of Electronic Material says that "cost management principles would suggest that digital material should preferably held in archives in a standard format, on standard media, and managed by one of a few standard operating systems. [...] However, prescriptive standards in the electronic information world have so far failed to achieve full recognition. The emphasis is now on ‘permissive standards’". Opinion of those involved in the technical aspects of digital preservation is that a range of guidelines for specific types of material or specific audiences are preferable to prescriptive guidelines which may be too narrow in their application. On the other hand there are proponents of specific technical solutions. Rothenberg, in a report published recently by the European Commission on Preservation and Access (ECPA), suggests that emulation is often the best technical process to guarantee long-term access to digital resources, and even goes as far as to say that this approach "in the author’s view, is the only approach yet suggested to offer a true solution to the problem of digital preservation".
Elsewhere, the CEDARS (CURL Exemplars in Digital Archives) project has a remit to explore issues relating to the preservation of and long-term access to digital resources. As far as technical processes is concerned, the focus of CEDARS is not on the preservation of particular storage media, but rather on long term access to the intellectual content of the resource.
ICSTI (The International Council for Scientific and Technical Information) has recently focussed on the issues relating to digital electronic archiving of scientific information. A study commissioned by ICSTI looked at policies, models and best practices in the area of digital electronic archiving. The study was concerned with the long-term storage, preservation and access to information that was "born-digital" or for which the digital version is considered to be the primary version. As might be expected, the study was also primarily concerned with scientific or technical material, which is of most interest to ICSTI members, although it was pointed out that the majority of projects relating to digital archiving are concerned with cultural or historic content. For this reason, humanities-related projects were used in a peripheral context in this study to support the central focus of scientific-based content. Four major organisational models were identified by the study, based on differences in the information flow, the management of the life cycle functions of the archive, responsibility and ownership of the data, and the economic model: data centres; institutional archives; third party repositories, and legal depositories. The report concludes that "There is so much activity among various groups that it is difficult to encapsulate the general state of digital electronic archiving". It also emerges that the issue of major concern seems to be that of intellectual property rights, whether this be the commercial concerns of the producers of electronic material, or the concerns over access and fair use in the digital environment voiced by other stakeholders such as libraries and users.
As far as guidelines on digital preservation are concerned, as recently as 1998, Fresko concluded that there were few widely accepted guidelines, and none which cover all the issues surrounding digital preservation. On the subject of preservation metadata he concluded that "we are reluctant to highlight any approach of those [guidelines] reviewed. The field is young, and no approach has a definitive lead". Although research and the development of guidelines has moved on since then, there is still very little in the way of clear international guidelines in this area.
Who is responsible?
Heated debate has been taking place for some time now over who of all the many players in digital archiving should have responsibility for long-term preservation of and access to digital collections. Many believe that the creator of the digital object should be responsible: after all libraries often do not ‘own’ the digital material in the same way as they own printed journals to which they have subscribed, so they do not have the same options for deciding on the long-term ‘storage’ of the material. The job then falls to the publisher - the creator of the digital work - to ensure that electronic journals will still be available in the long term, but publishers have never yet had to undertake the work of preservation, and it is not clear that they would wish to begin to do so. If neither creator (the publisher) nor subscriber (the library), then the job must fall to a third party, such as a digital archive respository. This debate has been at the forefront of recent discussion on the liblicense discussion list, and is likely to remain so for some time. There is some agreement that it is unfair of libraries to expect publishers to begin to take on the role of archiving when they have never done so before, but similarly publishers cannot expect libraries to preserve material which they do not own and do not have long term access to. There is good reason to expect licensing agreements between publishers and libraries to change in due course to take account of this dilemma.
"A strategy for digital preservation is part and parcel of any national information policy, and it should be integral to any investment in digital libraries and information superhighways". This comment, taken from the JISC/NPO summary report on the preservation studies, makes clear the need for national digital preservation strategies, and it is clear that a great deal of work is being done to work towards this aim, at least in the UK. The National Preservation Office continues to coordinate the development of a national policy for the preservation of digital material, and to promote awareness of issues and strategies in digital archiving, but at present "the UK lacks a strategy for the long-term preservation of digital information on a scale sufficiently large to support future scholarship and research".
The NPO has established a Digital Archive Working Group to take forward the work involved in developing such a strategy. The result was the launch of seven different projects to study various aspects of digital archiving. A further one-year project has now begun (from July 1999) in order to follow up the recommendations from that first series of projects. The Preservation Management of Digital Materials project aims to define best practice and guidelines for digital preservation, outsourcing and collaborative provision of preservation services. The project will investigate the various remote management strategies that are emerging and provide guidance on these different approaches. The work will also include a cost-benefit analysis of different remote management strategies.
Recording the digital collections
Another area of research and great debate is in recording digital collections to facilitate access. Again this is an area where some progress in developing systems, web-based directories or gateways is emerging, but once again there are no widely-used standards for describing digital collections. The NPO has commissioned David Haynes Associates to develop a National Register of Collection Strengths, Retention Intentions, and Preservation Status. The Register would be used to allow decisions to be made on promoting collaborative collection management initiatives, at local, regional and national level. The study uses the model proposed by the UK Office for Library & Information Networking (UKOLN) as the standard for collection level descriptors, and aims to coordinate preservation and retention by encouraging consistency in describing collections. This will in turn allow for comparison of collecting policies by subject area.
UKOLN are curently involved in several activities concerning collection descriptions. A review of existing practice is soon to be published which takes a detailed look at the state of the art for collection description as it currently exists in the library and related communities, and a further study outlines a simple conceptual model of collections and the services that provide access to those collections. The report enumerates a set of 23 core attributes for simple collection description, and discusses a possible approach for categorising different types of collections.
The challenge of recording the existence of digital collections, and making them widely accessible, is one which has no easy answers. Just as the trend for digitising traditional library collections appears unstoppable, so there is a growing number of projects and programmes which aim to record what digitisation activity has taken or is taking place. Some of these aim to identify important collections and to encourage their digitisation, while some simply record existing digital collections. Many are national in their coverage, some aim to be international; some have specific subject coverage or are limited by some other content criteria.
What does not appear to exist is very much coordination between these projects. While the stated aim of many inventories is to reduce duplication of effort when digitising collections, there appears to be no attempt to avoid duplication of effort when creating the inventories themselves, since little regard appears to be paid to what type of directory or inventory exists already. Unless interoperability, or at least cooperation, between different inventories is given high priority, it is difficult to see how duplication of digitisation effort can be reduced.
One such project, the IFLA/UNESCO Survey on Digitisation and Preservation, being carried out jointly by IFLA PAC (Preservation and Conservation) and IFLA UAP (Universal Availability of Publications) in the framework of UNESCO’s "Memory of the World" Programme, aims to register digitised collections of culturally significant heritage material across the world. The project has already undertaken a survey to examine current activity in the area of digitisation worldwide, and has more recently developed a web-based "Directory of Digitised Library Collections". The Directory aims to list major cultural heritage library collections which have been digitised. As part of the "Memory of the World" Programme, the emphasis is on cultural heritage collections and major libraries and other important cultural institutions.
The challenges which the development of this particular Directory have raised are reflective of very many similar inventory type finding tools. Within IFLA, it was clear that many such projects were being undertaken with broadly similar aims, while cooperation between the various projects was not taking place. The fear that these projects being carried out in isolation were not effective in providing information about what had been digitised led to a meeting of interested groups, which took place during the 1999 IFLA General Conference in Bangkok.
The aims of the meeting were to inform each other about the various inventory projects currently in progress; to identify areas of mutual concern; to consider what benefit there would be in attempting to coordinate the work of the various projects; and to recognise the need for consistency between different inventories and to encourage interoperability.
The meeting recognised that there is a need for some sort of listing of digitised collections: just as bibliographies are essential for recording a nation’s output, or the holdings of a particular library, then so is it necessary to record digital collections in some way. However it is clear that creating an inventory such as the IFLA/UNESCO Directory is fraught with challenges, making it essential to establish the scope of the directory at the very beginning. Basic questions, such as the level at which collections are described, are key to the development of an effective database, but it proves very difficult to set the record creation at the correct level.
Where a national inventory already exists, such as the Canadian National Digital Inventory, it would seem pointless to create a large number of collection-level records in an international database, when one link direct to the Canadian national inventory would offer the same range of information. On the other hand, to offer different levels of searchable records in an international database, depending purely on the existence or otherwise of a national inventory, would create an unbalanced service, where subject searches would reveal large numbers or records for those countries whose collections were recorded individually, and no ‘hits’ at all for countries for which the only entry was a link to the national inventory, hosted elsewhere on the Internet.
The meeting agreed that interoperability between inventories should be a target, but it was recognised in these circumstances that for those project which had already begun, this was too late to be considered in detail. The IFLA/UNESCO Directory, for example, was required to remain within the framework of its contract with UNESCO, and could not at this stage embark on developing the database to conform to any international standards. While this was regrettable, lessons could be learnt in this area, and it was generally agreed that no new inventory-type projects should begin without taking into account international guidelines or advice on best practice which existed already, and without relating new inventories to those already in existence.
As the IFLA inventories meeting concluded, perhaps the biggest factor in reaching agreement in areas like digital preservation is cooperation between all of the major players. This has been recognised by, among others, PADI (Preserving Access to Digital Information) in Australia, which has recently established a new discussion list, padiforum-l, for the exchange of news and ideas about digital preservation issues. PADI considers that a collaborative approach to guaranteeing long-term access is essential, and is keen to develop collaborative agreements to achieve this aim. In Australia, guidelines have been developed to select online publications of national significance to which long-term access should be ensured. Priority is given to "authoritative publications with long-term research value", and the guidelines cover the preservation of links between sites and the preservation of the constituent parts of larger sites. The Australian statement of principles include cooperation, distributed responsibility and the adoption of best practice and standards.
In conclusion, it is clear that a great deal of intense debate is under way concerning all areas of digital preservation. This is as it should be since clearly co-operation and collaboration are key elements in guaranteeing long-term solutions to the thorny issues surrounding this area. The JISC/NPO synthesis of the digital preservation studies produced a list of recommendations which should ensure that work in this area will be full and energetic in the near future. In particular, the two key areas in which further work must be carried out can be seen as co-operation and the development of standards:
IFLA supports the work being carried out by all of the major players in this important field, and contributes itself through the work of the Core Programme for Preservation and Conservation.
- Awareness must continue to be raised in order for the issues to continue to be explored and solutions sought.
- Communication must be encouraged. The newly established padiforum-l discussion list, and discussions such as the meeting of inventory developers held in Bangkok in August 1999 are good examples of how communication is essential to ensure full understanding and cooperation over key issues
- The development of standards and guidelines is essential to ensure a continued move towards consistency and the establishment of best practice.