Národní knihovna CR
Albertina icome Praha
images/space.gif

 

users.jpg (5671 bytes)


Contents:
1. Introduction
2. The development of opinions on the access to digitized documents
3. The digitization in The National Library
4. For whom to digitize?
5. Which information is given by the image of the document?
6. Mediated information is always reduced
7. The special image of the document
8. The properties of the human eye
9. Why to calibrate during photography and digitization?
10. The preparation of the description of manuscripts
11. The storage on CD-R
12. Internet
13. ManuFreT
14. The unlimited data life-span

 

1. Introduction

Let us inform you about what we do: We replace the information which would be available to the user during his work with the original document by more accessible, more durable, and reduced digital information.

The advantage of the use of digital information consists in:

  • contribution to preservation of the original document 
  • possibility of easy access
  • unlimited data life-span

The limitation consists in:

  • choice of information which depends on the point of view of the person responsible for digitization
  • impact which recording and reproduction have on offered information

These limitations exist at all times. They need not play an important role for the researchers if the data has been well prepared for them. 

The main purpose of our project is to digitize above all manuscripts, old prints, posters, maps, and similar documents. We can also work with other documents including audio and video. 

Our experience has shown how important is to consider carefully how to process the data and how to ensure a long life to the final products. 

Nowadays, we are working above all for the researchers, who attend the Manuscripts and Old Printed Books Department the National Library of the Czech Republic. 

These researchers can be divided into two groups:  

  1. The main group consists of the researchers, who are interested in the information that the author wanted to impart to readers either in the form of text or image. For these researchers, it is important to see the best-quality image, which is comparable with the original. Especially to them the results of our activity are assigned. The easy and continuous access to information of various sources enables to profit from the cultural inheritance of mankind. Of the same importance is also the possibility to exchange digitized documents with other institutions.
  2. The second smaller and usually professionally specialized group consists of the researchers, who are interested in the manuscript as a whole, i.e. a physical object. They also can use the results of our activity, but at the same time they have to respect the limitations, resulted from the character and amount of information. 

 The researchers, who do not dispense with the manuscript, are few. They are usually top experts, who know how to handle the original and how to avoid its damage. If the purpose is achieved, the topics for new research are less and the need for such a way of the use of the documents decreases. 

The process, whose purpose is to image by means of digitization more than what is visible by eyes, is very expensive. The expenses for the technology targeted on a higher resolution of image are increasing vehemently. Nevertheless, the copy cannot replace the original. With regard to this fact it seems to us to be efficient to invest money, which is available, exclusively for the benefit of the first group. 
 

2. The development of opinions on the access to digitized documents

We started our activity in 1992. In response to UNESCO and in co-operation with the National Library we created within the framework of the Memory of the World project the first CD-ROM disc with the rarest manuscripts and old prints, deposited there. That disc was followed by other ones with entire manuscripts and completed with descriptive data. At the same time we could test several digital cameras. 

The first discs we created were in fact experiments by which we tried to find the way how to solve the problem of digitization. These experiments allowed us to refine our opinions and discuss them, as well as they speeded up the penetration of digital technology into a such conservative sphere which is so characteristic for the work with the old documents. 

Thanks to our results we could familiarize with similar activities all over the world. We respect the labour, made in this field, but at the same time we feel certain stringency, because every product is different and uses different software. 

The essential and generally valid signification of the digitization process consists in the creation of digital data. All products, in which the important role is played by technology and software, age very quickly. 

Therefore we aimed at finding the way how to make data accessible independently of any program, platform, or producer of hardware or software. 
 

3. The digitization in The National Library

To the National Library we supplied one of the first cameras KODAK DCS 460 with the filter wheel. We aimed at obtaining full-value information. It was necessary to solve the problem of vibration, because our place of work was close to a busy street with tramway lines, as well as the problem of filters. On the other hand, this camera enabled us to apply exact methods of the optimal adjustment and calibration of image. The system ensures perfect reproduction ability independently of the changes of illumination. Together with every digitized manuscript we store the respective digitized form of the calibration table

The camera KODAK DCS 460 enables to achieve the image 2000 x 3000 dots in 38 bites (colour version). With this resolution we are able to scan, and after the first processing of the image we work with 24 bites (18 MB RGB). Thus we achieve the maximum resolution usually between 200 - 350 dpi. The resolution is adapted to the size and the type of the manuscript and it is not changes within the manuscript (with the exception of details digitized as separated images). Such manuscripts for which the achievable resolution is not sufficient are still not digitized for the time being. In well-founded cases we photograph and digitize large originals from slides. Then we convert the images to lower quality levels used for well defined purposes. 

For scanning and conversions we use the Adobe Photoshop program, for which we have written large macros. Thus the conversions can run automatically, usually at night. The description of digitized documents is done in another place of work. 

The final data are stored on CD-R discs and at the same time in digital archives. They can be viewed by any WWW browser or made accessible by means of Internet. However, the same data can be processed and vieved more comfortably by means of our program ManuFreT
 

4. For whom to digitize?

The digitized image is assigned to the people in the role of readers, who receive the message from some writer across centuries and long distances. The properties of the human eye have not changed yet and it is possible to assume that they will stay one and the same also after thousand years (let us hope). We start from a modest but well-founded assumption that the essential aim of our activity is the access to the image of documents, i.e. the mediation of the most authentic visual perception. 

On the base of this assumption we define the group of researchers to whom we should enable the access to rare documents by means of digitization. 

The technology of digitization is developing very quickly and on the other hand the prices of respective equipment are decreasing. If the development of our civilization does not stop, neither this trend will stop. Nevertheless, it does not mean that it is better to wait till the quality is even higher and prices lower. There is an analogy with the technical development in the field of sound. The development of recording have been for long time marked by the increasing of frequency and the decreasing of distortion. The chase of the quality of recording had stopped, when the bounds of the human ear were overstepped. 

Also the human eye has its limits. It is important whether we look at the manuscript like a quondam writer, or whether we use a magnifying glass or a microscope. The use of digitization instead of a microscope is uneconomical and it is not easy to justify it. 

Nevertheless, it is possible and useful to enable the access to any manuscript even nowadays. 

All these reasoning was the base of the routine digitization of manuscripts in the National Library of the Czech Republic

 

5. Which information is given by the image of document?

The image of any manuscript is from physical point of view endless, concentrated, and simplified information for our eye (not eyes, because of the absence of stereoscopic information). 

It is necessary to take following fact into account: 

  • The image of three-dimensional objects in two dimensions is endless reduction of primary spatial information about this object.
  • The only registered physical property, i.e. the reflectance of the original, does not tell much about its other properties.
  • The recording of reflectance is also endless reduction of primary information, contained in reflected electromagnetic undulation . This undulation, reflected from the original, is described by three digits, proportionate to the energy of reflected light in narrow-band wavelengths (RGB or CMYK). This way of access is derived from the properties of the human eye and at the same time adapted to them. In the case of visual perception there is a large compression and omission of the primary and in reality existing information.
  • If the information about the properties of filters and specificity of reflected light is not stored anyhow, the interrelation between them and the properties of the original is absent. 

It has to do with the special image of document

It is not necessary to analyse the effect of exposure angle in relation to the smoothness of surface, as well as the effect of the width of the objective and reflexes on its surface (in the best objectives this effect is measurable), its ability to image colours, the uniformity of exposure, the spectral sensitivity of recording medium (either CCD or film) and the effect of charging, storage, and development of film. 
The conclusion is clear: the digitized image is assigned to our eye only, not to the researcher, who is interested in more than his eyes can give. 
 

6. Mediated information is always reduced

If the researcher approaches any object, for example a manuscript, there is between him and this object a flow of information, mediated by senses, especially by sight. The researcher can change this flow of information if he handles it, i.e. he touches the pages, looks at them when he turns over the folios, looks them through, or when he changes the lights. He also can make up and use various means, which create new flows of information. Nevertheless, in the course of his work he has to be in contact with the manuscript, which he - in point of fact - wears out. 

If the contact with the original is to be eliminated, it is possible to conserve such a flow of information in order to use it later without using the original. This conservation has to be ensured during photography, digitization, making facsimile, or special image, for example in unusual spectral zone, in the course of looking the image through, etc. 

The image on the monitor, as well as the copy or the facsimile, creates for our eyes the flow of information, which is similar to the flow from the original under the same conditions as if it was from the recorded copy. Nevertheless, it is modified according to the properties of recording and reproducing equipment. 
 

7. The special image of the document

The aim is to register some specific physical property and apply specific methods, for example photography in unusual spectral or translucent zones. It goes without saying that it is useful to describe the conditions of scanning by the most objective way. Nevertheless, these images serve the researcher' s eyes only and represent the information, which should be trusted, and not used for further physical research of the original. Even that image is the result of certain research, that has been executed by now. 
 

8. The properties of the human eye

Read in professional literature how our eye sees and perceives colours. You will discover that it is a very complicated matter, while the various points of view are developing continually. 

This research influences the development of reproduction technologies, based on printing, where the question is the maximum adaptability to how our eye perceives. Whole reproduction technology is adapted to the properties of our eyes. The aim of this technical system is to store and re-evoke the subjective perception in the course of looking at the image. The question often is to provide us with the image that we would subjectively consider better, though it differs from the original. 
 

9. Why to calibrate during photography and digitization?

The digitized image provides relatively accurate information about the energy of light in certain wavelength, which impacted the sensor. This energy depends on many conditions, especially on the illumination of the original, the time of light's effect on the sensor, as well as the properties of filters, the spectral distribution of light, etc. For the purpose of the description and storage of this complete information we use the following principle: together with the documents we photograph the respective calibration table. It differs from similar calibration tables, because for every calibrating flat the exact spectral analysis of reflectivity in respective conditions is executed. These properties have the form of text and together with respective digitized calibration table are added to the data of digitized documents. It means that whenever in the future it will be possible to create the table of the same properties, or execute the corrections according to another table, measured by the same method. If the reproduction equipment (printer, monitor) is adjusted in order to reproduce adequately this calibration table, the image of the respective manuscript will be reproduced adequately as well. 

We are aware of what "adequately" means. With regard to different character of illumination (angle, volume) it is possible to achieve various images of the same document. Every document needs its own illumination and all opinions of which is optimal are subjective. 

Even the calibration, which we have executed, cannot restrain this subjectivity. Nevertheless, it enables to eliminate the distortion of the image, caused by the characteristics of respective technology. 

 

10. The preparation of the description of manuscripts

It is natural that we had to elaborate a system of access to digitized documents. On the base of our initial experience and research of other solutions we have elaborated a system whose aim is to be maximally independent of any other system or hardware. All data, which are added to the image, are recorded in the extended HTML. The reason of it was the request of long-run applicability of data, as mentioned above in the section, concerning the purposes of digitization. In effect it is questionable to ask the historians for the generating of HTML documents. We have solved this problem by dividing the preparation of the description of manuscripts into three parts. 

First, on the base of known properties of the manuscript we generate the text - file without concrete information, but in the structure, needed for this manuscript. At the same time, we add to every future image the name, by which the description of respective page will be connected with its image. For this purpose we use the program GENTEMP

Then the experts add to this text the concrete information about the manuscript, and - if need be - also more information about individual pages. 

The text is processed by the program GENHTLM. This process results in the creation of the group of interconnected HTLM files. 
 

11. The storage on CD-R

The digitized images are stored on CD-R discs together with HTML files. On the disc other information is stored, too. There is a medium-dependent identification file and a SGML map of the digital document. It is possible to view the final disc as any WWW page by any Internet browser. The data are independent of software as intended. 

 

12. Internet

It is evident that CD-R discs with manuscripts can be used immediately for the access in respective extend and quality on Internet. Thus the Internet browsers can be the means of access to our discs. This means has a lot of advantages, it is developing continuously, and it is widespread all over the world. 

Nevertheless, the access to individual manuscripts needs more; therefore, we have elaborated the system which registers digitized manuscripts and which enables to share this information on Internet. Now we are putting this automated system into operation. 
 

13. ManuFreT

Although the browsers are easily accessible means of viewing our discs, they are still in want of many functions, which researchers would appreciate. Therefore we have elaborated another program named ManuFreT. This program is able to read the HTML description of the respective manuscript as well as to index and image it. The manuscript is interpreted in the form of a virtual book, in which it is possible to browse as well as to mark and note. Besides, it is possible to handle the image. The program allows to view more pages together and change the scale of luminance and contrast, as well as to use the orientation view in the course of manipulation with magnified image and to measure the size and distance just as in the original. 
 

14. The unlimited data life-span

The digital data in the Memory of the World programme will grow continuously and they must be usable even in hundred or more years. For the first time there is a record of information which does not degrade in the course of time and which can cease to exist only on the base of our decision or together with the end of our civilization. There is often spoken about the limited service life of recording media. This relatively short service life (100 - 1000 years) is comparable with that of manuscripts which can be - under positive conditions - even longer. However, there is a misunderstanding. It is presumable that after 100 years the CD-ROM disc will be an anachronism. CD-ROM is not more than a medium. The same data can be stored on any other one. They can be rewritten and reconstructed and not a single bit of information bite will be lost. Thus they have unlimited service life and we have to generate them with all responsibility. They must be usable on the base of the needs of our unchangeable senses and not of the technical possibilities, which - on the contrary - are changing very quickly.