March 17, 2005
Draft Technical Specifications for
University of California Digitization Project
File Creation
Scanning: (includes deskewing, cleaning, cropping etc.)
preservation file: 8-bit grayscale, 600 pixels per inch, tif file format
current use file: 8-bit grayscale, 400 pixels per inch, tif file format (or jpeg file format, if cheaper: medium compression level (8 out of ten)?)
Paper-white level: 250-254
sharpening: moderate
OCR without zoning on 400 ppi files: (raw ocr, without any cleanup)
PDF Creation from 400 ppi files
Filenames
The Call Number will be supplied by UC Berkeley with the book and will be used as the prefix to the filename, replacing blanks with underscores. For example, F864 .M34 1970, becomes F864_.M34_1970.
The tiffs of the individual pages will use the call number followed by the page number, left padded with zeros to 4 digits, e.g., 0001.
Examples:
The pdf: F864_.M34_1970.pdf
Individual pages: F864_.M34_1970_0001.tif
F864_.M34_1970_0002.tif
F864_.M34_1970_0003.tif
F864_.M34_1970_0004.tif, etc
Technical Metadata
Assuming that all the images in one book are scanned on the same machine we only need the following technical metadata per book. If all books are scanned on the same machine we only need the following technical metadata for the entire project. If a value is needed that is not listed here, please contact us to get it added.
Tiffs/Images:
Type | Controlled vocabulary, pick one | digital still camera
reflection print scanner transmission scanner |
Brand | Free text | Examples: Phase One, Epson, Nikon |
Model | Free text | Examples: PowerPhase, 836xl, LS-2000 |
Serial Number | Free text | Examples: AK001109, 8204058, 212931 |
Bit Depth | Controlled vocabulary, pick one | 1 (1 bit bitonal)
16,16,16 (TIFF, HDR) 4 (4 bit grayscale) 8 (8 bit grayscale or palletized color) 8,8,8 (RGB) 8,8,8,8 (CMYK) |
Illumination | Controlled vocabulary, pick one | D55 Illuminant
D65 Illuminant D75 Illuminant Daylight Flash Fluorescent Standard Illuminant A Standard Illuminant B Standard Illuminant C Tungsten Lamp |
ColorSpace | Controlled vocabulary, pick one | 0 (Grayscale, White is Zero)
1 (Grayscale, Black is Zero) 2 (RGB) 3 (Palette Color) 4 (Transparency mask) 5 (CMYK) 6 (YCbCr) 8 (CIELab) |
File Format | Controlled vocabulary | tif (standard for master image) |
Compression | Controlled vocabulary, pick one | 1 (Uncompressed)
2 (CCITT 1D) 3 (CCITT Group 3) 4 (CCITT Group 4) 5 (LZW) 6 (JPEG) 7 (PackBits) |
Color Profile | free text | Name of a well-known profile
Example: Adobe RGB, Colormatch RGB |
If a digital camera is used, we also need:
Filter | 0 if none, name of filter if used |
Glass | 0 if none, information on glass if used |
PDFs:
Conversion Software | Free text (conversion software) | Example: Acrobat PDF for Word 5.0 |
Hardware | Hardware Platform used to create PDF | Example: Microsoft Windows 2000 Professional |
OCR Conversion Software | Free text (software used to create OCR) | Example: ABBYfinereader 6.0 |
OCR Hardware | Hardware Platform used to create OCR | Example: Microsoft Windows XP |
Language | Used in document | Example: English |
File Delivery
Files will be delivered to UC Berkeley Digital Publishing Group (DPG) on CD or DVDs (vendor choice). The tiffs should be grouped together on one set of disks and the PDFs will be on another. The technical metadata can be sent as email attachments or on the disks with the PDFs. The vendor will notify DPG when the disks have been sent.
DPG will have 90 days to review the files and contact the vendor about problems.