Preserving digital heritage data for future generations: CASPAR project
The amount of digital data being produced across various disciplines is increasing at an exponential rate. But this information might not be around for future generations: the format used to store it is often incompatible with rapidly changing technologies (e.g. changes in hardware and/or software) and data quickly becomes inaccessible.
The EU-sponsored, large-scale Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval project (CASPAR) addressed the growing challenge of preserving digital information, upon which society is increasingly dependent although it is intrinsically fragile.
UNESCO was part of the CASPAR team and contributed with digital heritage expertise. The team created a framework of tools and infrastructure components to support the end-to-end preservation of all types of digitally encoded information and thus help producers, curators and users of digital resources share the burden of their preservation.
The huge breadth of users and types of digital information against which CASPAR was tested was of particular importance, including:
- ESA satellite data and a variety of scientific data from the Central Laboratory of the Research Councils (CCLRC), and
- Cultural data from UNESCO heritage sites and French institutions that fostered the development of electronic music, such as the National Audiovisual Institute- Musical Research Group (Institut National Audiovisuel, Groupe de Recherches Musicales, INA-GRM) and the Institute for music/acoustic research and coordination (Institut de Recherche et de Coordination Acoustique-Musique, IRCAM).
UNESCO provided data centred on or associated with World Heritage sites. The type of data included legal texts, site descriptions, historic documents, laser scans, 3D models, virtual tours, satellite images and maps, among others.
Partners of the UNESCO-ESA ‘Open initiative on the use of space technologies for the protection of Heritage sites’ helped to provide sample data. For example, ETH-Zurich provided information on the Bamiyan Buddhas carved into the Afghan cliffs around the third century AD and destroyed in 2001. The Via Apia in Rome was also used as a main sample, with laser scanner measurements and satellite images used to model the associated cultural landscapes, virtual tours and virtual reconstruction of the site.
Other types of cultural data were provided by INA-GRM, which manages the French public radio and TV archives, and by IRCAM, an institution focused on electronic music that preserves components of scores, pieces of computer codes and instructions and documents indicating author’s motivations, to preserve the intelligibility or a minimal understanding allowing for future performances of the work.
Protecting data acquired by satellites for future generations is of utmost importance because it allows for the continuity of datasets. For example, scientists will be better able to understand and detect trends in global warming in fifty years’ time if they can still access today’s climate change data and apply this knowledge to ongoing natural phenomena.
The volume of data generated in environmental science is projected to increase radically over the next few years. ESA satellites, such as Envisat, ERS-2 and Meteosat Second Generation, are currently generating around 1,000 Gigabytes of data per day. With the upcoming launch of the new MetOp satellites, the daily data volume generated by ESA will continue to increase at an accelerated rate. ESA’s mandate is to maintain archives of the data gathered by each satellite during ten years following the end of the mission. Currently ESA is using funds from various ongoing programmes to maintain these historical bit streams in accessible archives.
In the long term, sustainable preservation of this information will require the logical integration of many more pieces of data and objects, such as the conditions under which the instruments were operated, the system and software environment used to gather the signal, and the algorithms used to manipulate the acquisition bit stream. All this information is required systematically for all instruments and missions, in a dedicated programmatic vision.
Within the CASPAR project, selected ESA satellite data streams will be the first objects to demonstrate how the proposed preservation platform architecture can be applied to handle complex digital objects. ESA will not only provide the necessary satellite data and associated information, but also the operational experience and demonstration infrastructure.
The Global Ozone Monitoring Experiment (GOME), launched on the ERS-2 satellite in April 1995, is set to be the first candidate. Since 1996, ESA has been delivering GOME global observations of total ozone, nitrogen dioxide and related cloud information to users via CD-ROM and the internet.
The project established an authoritative foundation methodology for digital preservation activities. The guiding principle was the application of the Open Archival Information Systems Reference Model (OAIS, ISO 14721).
The UNESCO World Heritage test-bed dealt with preservation of all data necessary to document, visualise and model cultural heritage sites. CASPAR provided a valuable resource to assist conservation experts in restoring the associated site even when its original state changes or deteriorates.
Further, through training and dissemination activities, CASPAR contributed to raising awareness about the critical importance of digital preservation among the relevant user-communities. This should facilitate the emergence of a more diverse offer of systems and services for the preservation of digital resources.
The CASPAR infrastructure components and tools proved to be applicable to essentially all types of digitally encoded information, whether from an archival or contemporary source. This is important because the artefacts created in the preservation process, for example for access control, will also require preservation, as will the CASPAR key components themselves.
As digital information is simultaneously becoming more ubiquitous, indispensable and fragile, there is a need to provide tools and techniques for secure, reliable and cost-effective preservation of digitally encoded information for the indefinite future.
Almost every piece of information we access today is stored in digital form somewhere –just think of digital cameras, mobile phones, not to mention our personal and professional information spread across e-mails, electronic documents and spreadsheets, social networking sites and blogs. It's difficult to imagine life without digital data in this information age. But who manages it? And, more importantly, who will preserve it? CASPAR addresses this challenge!