Contents - Previous - Next


Section XI: Future technology

11.1 New technology - friend or FOE?
11.2 Long term strategies for electronic documents - report from a swedish study 11 August 1995

 

11.1 New technology - friend or FOE?

11.1.1 What "new technology" can be recommended?

 

George Boston

1. What Is "New Technology"?

To most librarians and archivists, "New Technology" means the information storage systems that use electronic, chemical or optical information carriers in place of paper. Many of these systems are developments of what has traditionally been regarded as audio-visual media ie. systems for the storage of sounds and pictures. Other new systems are developments of the computers used for cataloguing or are hybrid designs using computers to control robots to find and play magnetic tapes or optical discs. All these systems store textual information either as ASCII code as used in computers and word-processors or as an image of the printed page.

For the former, a character recognition program fed by a scanner is required. The advantage is that the texts can be searched directly, by-passing the catalogues. The disadvantage is that the character recognition programs work with a limited range of fonts and are not 100% accurate. Photographs and diagrams have to be separately scanned as images and stored in a linked file. The character recognition programs are, however, steadily improving.

Texts to be stored as an image also have to be scanned. The texts stored using this method cannot be searched directly. The advantage to the researcher is that the page layout, together with any photographs and diagrams, is retained. Storage as an image also permits the preservation of maps, plans, music scores etc. in the same system. These non-text materials do not have to be handled separately.

There are many new formats for storing information available on the market and more are announced each week. The sheer range of competing devices for storing sounds, images and textual information bewilders many people. They are all forms of "New Technology" but not all of them are suitable for mass storage of information by libraries.

It is my view that only digital formats can be classed as "New Technology" for professional use. Why do I make such a sweeping statement. A statement that, for example, puts all the new analogue video recording systems into the dustbin? The answer lies in the basic differences between analogue and digital recordings.

1.1 Analogue Records

Analogue is what we are used to. Our eyes and ears are analogue devices. In A-V it is the movie film, the LP disc, the photograph; with text, it is the book, the magazine, the micro-film. An analogue recording - which includes images of a printed page - stores the information as a continuously varying parameter. This requires the carrier to be able to support an infinite number of states - for example, every shade of grey between white and black in a photograph - for accurate storage. The variable parameter might be an electrical voltage, a magnetic flux density or a variation of opaqueness and colour on a film strip. It requires perfect fidelity from every part of the storage medium to guarantee a good record. A spot of mildew or fading of colour on a photograph means that information has been lost.

With an analogue recording, the quality and fidelity of the information is degraded every time it is moved from carrier to carrier. The copying process also relies heavily on skilled staff able to assess the quality of the copy to ensure that it is the best that can be achieved. Skilled staff cost money. The process is also time-consuming - yet more expense.

1.2 Digital Records

A digital recording represents the information as a series of binary coded numbers. This is a much more rugged system because only two pieces of information - the binary digits 0 and 1 - are ever recorded. These two numbers are represented by high or low levels of magnetic flux, positive or negative voltage, black or white grain of silver. There are no in-between values.

As we are dealing with numbers, it is possible to construct a copying system that checks the number on the original carrier against the number recorded on the copy. If the numbers match, the copy is an exact facsimile, a clone, of the original. This means that there has been no degradation in quality or fidelity because of the copying process.

One other advantage of binary coding comes from the fact that there are only two digits. This means that if the position of an error can be identified, it can be corrected. If, at the position of the error, the magnetism on the carrier is "High" then the only possible alternative is "Low". If it were a decimal number, finding the position of the error would not help; what would you change the erroneous digit for? All digital systems include some form of error detection and correction process.

2. Why Consider a Move to "New Technologies"?

Many librarians are considering moving some, if not all, of their collection to "New Technology". There are three main reasons for this. One danger, however, is that, because "New Technology" is rather mysterious and requires a special breed of acolyte called a "Technician" to serve it, "New Technology" becomes to be regarded as a magic wand that can be waved to solve problems.

2.1 Decay of Existing Carriers

All types of information carrier are decaying. Some types are decaying faster than others. Historic manuscripts are often in better condition than documents on modern paper. The underlying fact, however, that cannot be escaped is that ALL documents whether on paper, vellum, parchment, papyrus, clay tablets, carved in stone, on magnetic tape, shellac discs or on photographic film are decaying. Some carriers will take a decade to decay; others will take millenia. We can slow the processes but we cannot stop them. We can, however, easily accelerate decay by bad handling practices and bad storage.

There is a feeling that the magic wand of "New Technology" can be waved and the processes of decay can be stopped in their tracks. Well, "Yes" and "No".

The carriers used to hold information in the new, large capacity storage systems are themselves liable to decay. The rate of decay is probably going to mean a safe life of about 50 years. This will mean that the collection may need to be copied twice every century. If the storage was analogue this would be prohibitively expensive. As we are talking about digital storage, the problem is, potentially, taken care of automatically. At least one manufacturer is already marketing an automated storage system that is controlled from, and sends information to, standard computer terminals. The automated library provides the information requested with a delay of about 1 minute. This saves porters, potential loss or damage to the carriers and the users time.

In addition, this robotic library is self-checking. Each carrier within the library is regularly and automatically checked for errors in the digital coding and, if the error rate is above a preset level, the carrier is automatically copied and the copy, which is now better than the original because the errors have been corrected, is inserted into the library in place of the faulty original.

2.2 Lack of Storage Space

The sheer volume of information that modern society wishes to keep is creating major storage problems for many institutions. Many of the current carriers are bulky and require more and more storage space. The cost of building is high; the cost of building to the standards required by a modern library are very high. The good news is that politicians like grandiose building projects, particularly if the project is finished in time for their photograph to be taken cutting the ribbon on opening day. The bad news is that the politicians do not like paying for the running and maintenance costs of buildings - there are no photo-opportunities. So the shiny new library building also starts a slow process of decay. The hope is that "New Technology" will wave a magic wand and solve this problem by giving politicians more photo-opportunities opening a series of gleaming new toys that do not cost much to run and maintain. Perhaps.

Much publicity material has been issued about the high density of storage that can be achieved by using "New Technology" coupled with various methods of data reduction. Let us be clear about what data reduction means. It means throwing away information. In some cases, particularly with printed materials, it is possible to digitise the image, greatly reduce the number of digital bits of information used to store the document and still have an satisfactory, readable image. Reduction ratios of 400 to 1 are perfectly feasible. If, however, the document to be copied is not purely text but, for example, an illuminated manuscript, data reduction will irreversibly harm the fidelity of the record. In addition, the future copying or migration of data reduced information may not be possible without further damage to the information. Be clear about the reasons for and the dangers of using data reduction when making master or preservation copies of documents.

An alternative to data reduction is data compression. This does not offer such greatly reduced data densities as data reduction - a reduction of 10 to 1 in the number of bits used to store information is excellent - but it has the great advantage of being reversible. When deciding on a data compression system, however, question the process very carefully. The two terms - data reduction and data compression - are frequently mis-used in manufacturers' literature.

Before taking radical steps to move a collection to a new carrier because of lack of space, it is worth considering the real cost of the space used against the other costs of running the facility. Papers on this subject were presented to the 1988 IASA Annual Conference in Vienna by Dr Dietrich Schüller of the Austrian Academy of Science3 and Cor Doesberg of the Netherlands Broadcasting Service. (See Section X: 10.1 and 10.2.)

2.3 Access

The rows and ranks of shelves in a well run library are impressive. They do not, however, provide easy access for large numbers of people. Many large institutions require researchers to apply for the right to enter the premises at all. Once admitted, the researcher then has to search a card index or data base to make a list of likely items of information. Often, an employee of the library has to then fetch the required information for the researcher - it not possible to browse the shelves. If the information is at a remote storage facility, it may take several hours, if not days, before it is actually available to the researcher. The hope is that the magic wand of "New Technology" will sweep all these restrictions away. Again, "Yes" and "No".

A digitised collection can be examined from a standard computer terminal. The existing reading desks can be modified by adding a suitable computer. This, however, does not increase the number of reading places available - it may, in fact, reduce the number of places available if larger desks are required.

The answer is to use the modern telecommunications networks to transmit the digital information to places outside the reading room or, even, outside the library. The technology is available to allow researchers remote access to collections whenever they require it. Researchers can sit at their office computer and search the catalogues, call up and view documents over the public telephone systems. Even if they have to subsequently visit the library to study the actual documents, assuming that they still exist, much time can be saved by locating the documents before making an expensive journey.

This vast improvement in access is not a pipe dream. The E-Mail networks bulletin boards and conferences are already a common tool of research and permit the on-line examination of many catalogues. A number of experiments exploring the implications of a move towards remote access to documents are in operation now. The practical system is possible now.

The big problem with offering vastly improved, but non-destructive access, to a collection is copyright. The copyright laws in many countries are very restrictive and would forbid the distribution of copyright information by electronic means. The position is, however, not as black as it appears at first sight. The copyright holders in many cases are commercial organisations who wish to profit from the use of their material. It is possible to set up systems that automatically invoice users when they view copyright information. The potential increase in income to the copyright owners can be a big selling point when negotiations are held with them.

3. Where Can We Look for Enlightenment?

Earlier in this article, it was pointed out that many of the "New Technologies" being considered for the storage of textual information have their roots in the world of audio-visual media. The first audio-visual archive was founded in Vienna in 1899 - less than one hundred years. Most have been in existence for less than fifty years. In that time, however, they have experienced the problems of a whole series of "New Technologies". These have ranged from the cylinder Phonograph of Thomas Edison to the Compact Disc; from the Daguerreotype photographs to the modern instant camera; from the early silent, black-and-white movies to the latest surround-sound, colour, three-dimensional blockbuster from Hollywood; from the first attempts to record television on film to the latest marvels of miniaturisation produced in Japan.

How are all these existing, and future, wonders viewed by the A-V archives and libraries? The answer is, with great trepidation. They are encouraged, however, by the experience gained in handling all the earlier versions of "New Technology". If the objectives of any change are clearly understood and the advantages and disadvantages fully examined, then installing "New Technology" is merely part of the process of the spread of information that started with the rock drawings of pre-historic man.

4. What Are the A-V Collections Concerns?

4.1 Preservation Policy

The fundamental question that faces all archives and libraries is should the original carrier be preserved at all costs or should the policy be to preserve the information that it carries.

Most of the carriers - the books, newspapers, magazines etc - being stored do not have great intrinsic value. A few items, relatively speaking, do. Usually this is because of their status as a work of art or because of particular historical connections rather than because of the information that they carry. I would, for example, argue that mediaeval illuminated Bibles are works of art first and documents second. Similarly, documents such as the Magna Carta or the Declaration of Independence are, because of their national and international historical importance, of greater value than the text contained on the pages alone.

These categories of carrier must be preserved with the same vigour that we strive to preserve a painting such as the Mona Lisa or a building like the Taj Mahal. To attempt to strive as hard to preserve all the copies of "Le Figaro" and all the other newspapers published around the world is impossible. The world does not possess the resources in trained manpower and finance to even begin this task.

If this is accepted as a reasonable argument, then we have to further accept that only relatively minor efforts will be made to preserve most of the carriers - the books, the audio and video tapes, the magazines, the films - currently on the shelves of our libraries and archives. The effort will be put into preserving the information by copying it to new carriers whenever necessary.

In this context, it should be noted that the new UNESCO programme for the "Memory of the World" has the preservation of information as one of its major objectives. The second major objective is to greatly improve access to documents. The programme is an attempt to find a solution to the increasing loss of documents of all types - the loss of the "Memory of the World".

4.2 Copying Costs

It was said earlier that the cost of copying a collection twice in one century in the analogue domain would be prohibitive. It is also clear that the only question is "When", not "Will", copying be necessary. The major expense is the skilled labour required. The cost may be reduced by selecting the material to be preserved and, thus, reducing the size of the collection. Selection, however, will also require considerable effort by skilled staff. These costs must be borne once to make the first digital copy. Subsequent preservation copies can be made more cheaply and, in many cases, the process of migrating the collection may be automated.

5. Training

As the world of libraries comes to rely more and more on technology to supplement or replace the printed word on paper, there will be an increasing need for training. This training will be of several types depending on the level of involvement.

The basic type of training will be to increase the awareness of people about the possibilities offered by "New Technology" and to teach them to be able to make use of the equipment. This is the equivalent level to teaching people how to use the computerised catalogues that have been installed by many libraries. The second type will be that required by the people running the system - the cataloguers, administrators etc. The equivalent here is, for example, the training needed to become fully familiar with a complex word processing or database software package. The third type of training will be for the programmers and the maintenance technicians.

This training requirement must be costed into any proposal for the installation of "New Technology". If the training is neglected, the full benefits of the system will not be realised and the drawbacks will be magnified. Training is not an expendable option to be cut or postponed if the accountants say that the costs of the project must be reduced.

6. What Can be Stored on the Current Generation of New Technology?

The storage capacities available at present permit the serious consideration of robotic stores using magnetic tape cassettes for storing textual information either as ASCII coded information or as scanned images. Photographic stills and sounds can also be considered. The storage capacity available at present does not make the use of these systems for moving images an economic proposition. This will change in the next few years. Storage systems able hold thousands of feature movies at the high quality required are under development now.

7. Conclusion

The purchase of "New Technology" must be approached rather as one would approach a used-car salesman. Not only must you be clear about your needs, to avoid being persuaded to buy a model with an unsuitable specification or being talked into buying a washing machine to go with it, but you must ensure that the performance is as claimed. A healthy dose of cynicism and an independent expert are sensible assets to have with you when talking with suppliers. If all goes well, "New Technology" can be a great friend; it also has great potential for becoming a foe.

11.1.1 What "new technology" can be recommended?

George Boston

There is no clear cut recommendation that can be made. Each archive and library will have to examine its own needs carefully, examine the systems available with great care and come to its own conclusion. What can be offered is some guidance in making the choice. Factors that will need to be considered include:

1. Capital Costs

The cost of purchasing the new system is obviously a factor to be considered. What is sometimes overlooked is the cost of installation.

2. Running Costs

The cost of storage materials per hour of recording or per page of print is another obvious factor to be considered. Less obvious are factors such as the cost of providing the correct storage environment, the cost of maintenance and spare parts and the level of back-up required for the equipment to guarantee access to the collection.

3. Specialist Staff

Specialist staff requirements are a combination of the above two points. The "capital" element is the amount of training the staff will require to operate and maintain the system. The "running cost" element is the salaries. The more highly trained and specialised the staff, the more likely it is that they will also demand high salaries. If the skill is not required on-call at all times, it may be possible to hire the skill in from the manufacturer or the agent when required. This may be a cheaper option overall but at the risk of longer delays in repairing breakdowns.

4. Storage Capacity

Is there sufficient storage space in the system to house the collection and allow for expansion? Can the storage be added to as required? This will enable the storage to be purchased as it is required.

5. Building Space

If the new system requires large areas of space that are not available, the custodian has to decide whether space should be provided - a major investment in most cases - or whether an alternative format should be chosen. The danger here is that the financial costs may lead the Trustees or controlling board of the collection to choose a highly data reduced format with a consequent loss of fidelity and probable future difficulties.

6. Compatibility of the System with Existing Equipment and Carriers

If the new system can make use of existing machines and equipment, the cost of moving to a new format may be reduced. When examining this factor, all the existing machinery, cable networks and operational systems must be considered.

7. Information Retrieval Rate

This covers both the time that it would take to find and play the carrier to a client and the speed at which information can be extracted from the system. The reason for considering the first of these points is client satisfaction. As the demand for information rises and the use of computers spreads, clients are less and less willing to wait for a document to be collected from a distant storage facility before they can examine it.

The second point concerns the rate at which the system can deliver information. Particularly with high-quality moving images, it is possible to store the recording using many bytes of information and not be able to replay the sounds and images in real-time. The rate at which the information can be supplied from the system is slower than the rate required to reproduce the original sounds and images at the correct speed and quality. In these circumstances, the information would be down-loaded into a buffer store within the client work-station taking, perhaps, 15 minutes to transfer 5 minutes of material. Once in the buffer store, the material may be viewed again and again by the user without delays.

8. Ease of Access

The ease of access to information is tied in with client satisfaction. It covers such matters as the use and completeness of the catalogue information and whether the client can directly request information from the system or whether a member of the collection's staff has to be involved. The degree of control the client has over the play-back of the information is also a factor. The desired answers to these questions will vary from collection to collection and will also involve questions about the security of the information.

9. Security of the Information

Many collections contain items that have different levels of access depending upon copyright restrictions and/or the wishes of donors. Free access to all the collection is, therefore, unlikely to be allowed. A series of password levels should be the minimum security provision offered by the system.

The collection may also wish to charge for access on a time or item used basis. This is also tied in with the security system. Certainly, no client station should be able to alter the master copy of the stored material in any way. Access copies made for the client, perhaps recorded to a lower standard, may be altered but not at the client's work station.

10. Ease of Copying

This is a contentious issue and links in with security in many cases. Depending on the copyright and access status of the material, clients should be able to obtain copies of material, with an appropriate fee if necessary, to continue research off-line or elsewhere. Such copies should be easy to make if allowable and difficult if not permitted.

Copying may also be used as part of the collection's policy of improving the access and diffusion of the material to a wider audience. The copies may be used by researchers, by the public, in education or as part of a publicity programme for the collection.

11. Life Expectancy of the System

It might be as well to define "System". The recording of, and access, to information stored in a machine-readable form requires two items: the carrier of the information and the information player - the machine. If either component is missing, the other is worthless. To date, the weakest link has tended to be the equipment. The carriers have outlasted the manufacturer's willingness to make and supply machines and spare parts. With all new formats an assessment of the likely duration of support from the manufacturer and the likelihood of obtaining spare parts in the years that follow the withdrawal of this support.

Having an assessment of the likely supported life of the equipment does not solve all the problems facing someone choosing a new format but, at least, it gives the collection administrator a time-scale to work within. It enables a programme to be set up for the provision of a future replacement system.

12. Security of the System

The security of the system covers the factor of replacement equipment within the format. For example, a system that uses magnetic tape may wish to replace some of the machines before considering a complete change of format. It will be easier to guarantee this if there are several manufacturers making suitable equipment. This will mean that there is a large enough market to support several manufacturers, that the availability of spare parts is likely to continue and that the prices are competitive. Single-sourced equipment is much more vulnerable to the manufacturer going out of business or ceasing to make the product.

13. Ease of Migration

The ease of copying or duplicating the collection is a factor to be considered for the future when the next migration of the collection is made. If the process is over-complex, it will create unnecessary and unwanted delays when the migration is being performed. If it has not been considered by the equipment makers, discuss it with them now while you are still a potential customer that they wish to please. Do not leave it for 25 years or more, to the time when you want to migrate the collection again and the company has only one machine left - in their museum.

14. Re-generation of the Information

A modern system should include a self-checking program that routinely checks all the carriers for errors. When the error rate rises above a certain threshold level, the original is copied via the error-correcting program and the copy put in place of the original.


Contents - Previous - Next