![]() ![]() |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Contents: | |
| 1. Introduction | |
| 2. The Purpose of the GENTEMP and GENHTML Programs | |
| 3. Distribution of duties between the three work sites | |
| 4. How to get started? | |
| 5. Conclusion | |
| 6. Literature |
This article describes the method of setting up a standardized description of a digital copy of a manuscript. This description consists of the so-called metadata (see [1]). It introduces you with the programs which make the process of creating a standardized description easier and explains about the files which form a digital copy of a manuscript on CD-ROM. The method was designed in accordance with the document Proposal of the structure of digital copies of manuscripts and old books, version 2. 1, 1997, AiP, NK Praha [1] based on a more general work Digitization of Old Books, Manuscripts, and Other Documents: The Format for Storage of Metadata, version 2. 1, 1997, AiP, NK Praha [6].
The text [1] sets forth the rules and regulations for creating standardized descriptions of digital copies of old books and manuscripts which will be backed up on digital media in accordance with UNESCO' s Memory of the World programme. As the previous version also this proposal has been written on the basis of recommendations of the Memory of the World Sub-Committee on Technology.
The description of digitized manuscripts in [6] and [1] is developed so that it can be used in any WWW browser, for example NETSCAPE or MSIE. Digital copies can, at any time, be made available on Internet. For this reason, the extended HTML rules are used as a basis for creating the structured description of digital copies of manuscripts. The description, in accordance with the international SGML (ISO 8879) standard, contains information which could be later used for mass processing of data. Special search software can be applied to digital copies of old manuscripts. An example of such software is the ManuFret application that is developed by the Albertina icome Praha Ltd. and which allows the user not only to look through, magnify, reduce, and print pictures of individual pages of a manuscript, but also to search HTML texts with a large possibility for entering a query.
According to the proposal in [1], a digital copy of a manuscript is composed from pictures of individual pages (in [1] marked as data) and a structured HTML description (in [1] it is spoken more exactly about metadata, DOBM, DOBMENT and MNSXDEF.INF files). The goal of this method, which is for example being used in the Czech National Library, is to find the easiest, quickest, and safest method of producing a digital copy of a manuscript.
This method of forming digital copies of a manuscript:
The process of converting a description of a manuscript into the extended HTML format is gradually realized through the use of two programs - GENTEMP and GENHTML.
The GENHTML program generates HTML documents (appropriate to [1]) from the text file input which is written according to simple rules. The person who is preparing the descriptions of the manuscripts does not necessarily have to work directly with the format defined in [1] or with the HTML language.
The GENTEMP program speeds up the preparation of the text file input for the GENHTML program. It is a very simple application which, on the basis of several entered parameters, creates a text file with the prescribed structure and with numbers of the individual sheets or pages of the manuscript. We will call the product produced from the GENTEMP program a template.
With the use of both programs, the preparation of the structured HTML description is as follows:
The method of creating digital copies of manuscripts with the help of the GENTEMP and GENHTML programs is depicted in Figure no. 1.
The process is dependent upon the co-operation between three independent work sites: the documentation work site, the digitization work site, and the work site for CD-ROM preparation. The method of communication and the transfer of data between the work sites is dependent upon the knowledge and qualifications of persons working at the individual work sites. If the persons at the documentation work site are not interested in arranging the appearance of the final HTML documents or they do not want to become familiarized with the HTML format, the distribution of duties is as follows:
An automatically created description consists, among others, of general data about the manuscript coming from the AACR2 standard (BIBLDESCR section), of the book that is a list of numbers of individual pages of the manuscript, possibly also of a gallery of small pictures of individual pages of the manuscript and the descriptions of individual pages which contain references to pictures which are of a higher quality. Each page of the manuscript corresponds to one HTML file. All pages of the manuscript are specified by specific statements (for example: Foliation, Motif, Latin text, Translation etc.)
As a necessary condition for HTML references to files to work, it is necessary to place all the HTML files, which are part of the description, into the same subdirectory with the MNSXDEF.INF file (see [1]) be it on one or more CD-ROM's. During the naming and saving of files, the user must follow specific rules. The number of the sheet or the page decides what the name of the picture will be, while the path is decided by the quality of the picture. If the digital copy is saved on more than one CD-ROM, you must be careful when separating the individual files. All files which are in any way connected with an individual page of the manuscript have to be placed on the same CD-ROM.
The technology we have presented here is based on the idea that all of the pages of the manuscript will be treated in the same way during the digitization of manuscripts. What we mean by this is that for all pages, there are pictures of the same quality (PREVIEWQ, GALLERYQ, INTERNETQ, NORMALQ, or EXCELLENTQ - see [1]) available. In addition we expect that for each page of the document, at least one picture of a specific quality will be made. In more detail, each page of the manuscript is represented by a maximum of five pictures (one picture for each quality).
Before you start reading further, use the WWW browser to take
a look at the examples of the digital copies of the manuscripts Labirynt sveta a lusthauz srdce, Knizky sestery o obecných vecech krestanskych and Codex Pictoricus Mexicanus (part). Doing this
will give you a better idea what the digitized manuscripts look like.
We feel that it is a good idea for the people participating in the digitization of the manuscripts to go through the entire process of making a digital copy of a fictitious manuscript (all but the scanning of the manuscript). This gives them a better idea about how the process works. At the same time, they have a chance to figure out the manner they will use in transferring the information between the different work sites.
Let us assume that we will want to create a digital copy of a manuscript which has a front cover with an outside and inside part, a back cover, with an outside and inside part and five sheets located between the two covers. Let us say that we will want to number the sheets (not the pages) of the manuscript and that we will not attach to our own pages any additional data.
Start the GENTEMP application. After pressing this icon, the text form, which is used for deciding the appearance of the digitized manuscript, appears on the screen. In this form, you examine the information which it already contains to see if the information fits the above-mentioned demands. When opening this form, the method of numbering is set on Foliation. This is OK if you plan to number the manuscript by sheets. When the application is opened, Front Cover, Front End-Sheet, Back Cover and Back End-Sheet ( the outside and inside part of the front cover and the outside and inside part of the back cover) are all chosen. This means that the covers of the manuscript and its pictures will be scanned at the same time. Since our fictitious manuscript contains five sheets, replace the 0 in the Number of sheets in main part line with 5. Now that everything is ready, open the dialogue window with the Generate button and then choose the name of the template and directory in which you want to save the generated file. Now press the OK button. After pressing OK, the application creates the requested text file. Text that is enclosed within combined brackets is not part of the template, it is only general information about the template. Three vertical periods stand for left out text.
{The user fills in the general information about the manuscript. This
information is derived from the AACR2 standard}
| !Document title: {in this spot, write the name of the manuscript, it can be shortened} !Shelf-number: !Library: !Owner: !Title: !Author: !Language of the Original: !Image Capturing Data: !!!!! end of bibliographic description !!!!!!!!!!!!!!! { names, types, labels, and language of the statements with the help of which the individual pages of the manuscript will be described} !Foliation: TEXT, FOLIATION, EN !!!!! end of definitions !!!!!!!!!!!!!!! {first record - description of the front cover of the manuscript} !Foliation: !!!!! end of record no. 1 !!!!!!!!!!!!!!! {second record - description of the inside part of the front cover} !Foliation: !!!!! end of record no. 2 !!!!!!!!!!!!!!! {third record - description of the first page of the manuscript} !Foliation: !!!!! end of record no. 3 !!!!!!!!!!!!!!! {fourth record - description of the second page of the manuscript) !Foliation: !!!!! end of record no 4 !!!!!!!!!!!!!!! {fifth record - description of the third page of the manuscript} !Foliation: !!!!! end of record no. 5 !!!!!!!!!!!!!!! {sixth record - description of the fourth page of the manuscript} !Foliation: !!!!! end of record no. 6 !!!!!!!!!!!!!!! !Foliation: !!!!! end of record no. 7 !!!!!!!!!!!!!!! !!!!! end of record no. 11 !!!!!!!!!!!!!!! !Foliation: !!!!! end of record no. 12 !!!!!!!!!!!!!!! !Foliation: !!!!! end of record no. 13 !!!!!!!!!!!!!!! !Foliation: !!!!! end of record no. 14 !!!!!!!!!!!!!!! !!END {the sign for the end of the text file}
|
Close the
GENTEMP application with the Exit button. Start the text editor WORDPAD or NOTEPAD
(eventually if a different program will be used, make sure that you preserve the text
format). Open the template and fill in the first part with general information about the
manuscript (Shelf-number, Library, Owner, ...). Separate a new paragraph or entry in
the text with a tab or one or more spaces at the beginning of the line.
| !Document Title: Shortened Chronicle of the Republic of Lapalie !Shelf-number:
!Library: !Owner: !Title: !Author: !Edition: !Type of Document: !Publisher: !Place of Publication: !Printer: !Place of Printing: !Datation: ......... !Physical Description: !Material: !Size: !Extent: !Language of the Original: !Image Capturing Data: . !Foliation: TEXT, FOLIATION, EN !!!!! end of bibliographic description !!!!!!!!!! !Foliation: |
Save the entire text file and close the edit program. Open the GENHTML program and fill out the formula. By pressing the buttons with the yellow covers, which have Input File/Name written above them, you open the dialogue window. In the dialogue window, first choose the directory and then the name of the template which has been filled out. After, set the correct code from the five codes offered in the Input File/Code Page line. If your Windows system uses a western system code, choose the Windows Latin I code page. If your Windows system uses an eastern system code, choose the Windows Latin II code page. If you worked with DOS editor, choose Latin I, Latin II or MJK (the Czech code of the Kamenický brothers). Your choice depends on the type of code that your editor uses.
| Note: | If you write the descriptions solely in English and you don not use characters not included in the English alphabet, which we recommend, you need not concern yourself with choosing the correct code because any of the codes you choose can be used. |
Choose the button in the lower right part of the formula and in the dialogue window set the Output Directory, into which the generated HTML documents should be saved. After doing this, press the Generate button. If a mistake is found, fix the input text file and press the Generate button again. For now, ignore any warnings about the Contents, Notation or Illuminations. After creating the HTML documents, close the GENHTML application with the Exit button. Your structured HTML description is ready. If, however, you want to view the pictures of individual manuscript pages with a WWW browser, (for example NETSCAPE) you have to copy the files with these pictures, to the appropriate place. The subdirectory GALLERY, located in the ...\EXAMPLES\FICT directory, contains files with GALLERYQ pictures of pages of the fictitious manuscript. Copy this subdirectory to the directory where your MNSXDEF.INF file is located. Do the same with the PREVIEW, INET and NORMAL directories. Start an HTML browser and begin to look through your fictitious digitized manuscript by opening the EN\DESCR.HTM file
Make note of how the names of HTML documents and files containing
pictures are derived from the number of a sheet, more exactly from the item
Foliation.
| Page with Foliation: | The name of the HTML file describing the page with foliation: | |
FC |
EN\FC.HTM EN\FS.HTM EN\0001R.HTM EN\0001V.HTM EN\0002R.HTM . . . EN\0005V.HTM EN\BS.HTM EN\BC.HTM |
|
The path names of the files containing pictures of pages of the
manuscript are the same as the names of the HTML files which they represent. The extension
name, only *.jpg or *.gif are allowed, is determined by the quality of the picture. The
quality of the picture designates the name of the subdirectory. The method of naming files
containing a picture of a page of the manuscript can be clearly seen it the following
examples:
| Page with Foliation: | depicts GALLERYQ picture with a name: | ||
| FC FS 1r 1v 2r . . . 5v BS BC |
GALLERY\FC.gif GALLERY\FS.gif GALLERY\0001R.gif GALLERY\0001V.gif GALLERY\0002R.gif . . . GALLERY\0005V.gif GALLERY\BS.gif GALLERY\BC.gif |
||
| Page with Foliation: | depicts PREVIEW picture with a name: | ||
| FC FS 1r 1v 2r . . . 5v BS BC |
PREVIEW\FC.gif PREVIEW\FS.gif PREVIEW\0001R.gif PREVIEW\0001V.gif PREVIEW\0002R.gif . . . PREVIEW\0005V.gif PREVIEW\BS.gif PREVIEW\BC.gif |
||
| Page with Foliation: | depicts INTERNETQ picture with a name: | ||
| FC FS 1r 1v 2r . . . 5v BS BC |
INET\FC.jpg INET\FS.jpg INET\0001R.jpg INET\0001V.jpg INET\0002R.jpg . . . INET\0005V.jpg INET\BS.jpg INET\BC.jpg |
||
| Page with Foliation: | depicts NORMALQ picture with a name: | ||
| FC FS 1r 1v 2r . . . 5v BS BC |
NORMAL\FC.jpg NORMAL\FS.jpg NORMAL\0001R.jpg NORMAL\0001V.jpg NORMAL\0002R.jpg . . . NORMAL\0005V.jpg NORMAL\BS.jpg NORMAL\BC.jpg |
||
The purpose of this article was to introduce you with the method of
generating a description of manuscripts in the extended HTML structure and how to set up
files into their final form on CD-ROM. You will find directions for using the programs in
[2], [3]. The structure of the text file, which is made by the GENTEMP program, is
explained in [2]. The final form of the digital copy is explained in more detail in
[3].