METADATA
CREATION
Metadata creation and implementation are resource-intensive
processes. Balance costs and benefits in developing a metadata strategy,
taking into consideration the needs of current and future users and collection
managers. Identify metadata requirements at the onset of an imaging initiative.
These requirements should be tightly linked to functions that must be
supported (e.g., rights management, resource discovery, and long-term
care).
Consider
the following issues:
Although
some metadata elements are static (e.g., date of creation, scanning
resolution), certain fields (e.g., migration information) may continue
to evolve and require continuous updating and maintenance.
The creation and management of metadata is accomplished through manual
(creating a Dublin Core
record) and automated (generating a keyword index from OCR'ed text)
techniques. Similarly, metadata quality control will be based on a mix
of manual (evaluating the quality of subject access categories and keywords)
and automated (using an SGML parser to validate tags) processes.
Metadata
can be internal (file naming, directory structuring, file headers, OCR,
SGML) or external (external indexes and databases). The key factor in
decision making is evaluating whether the location supports functionality
and resource management. For example, TIFF file headers are instrumental
in recording metadata internally; however, this metadata is usually
lost when the TIFF files are converted to other file formats, such as
JPEG or GIF.
There
are several standards in development to facilitate interoperability among
different metadata schemes. The
Resource Description Framework
(RDF) is an XML-based application to provide a flexible architecture for
managing diverse metadata in the networked environment. The goal of the
Digital Imaging Group's Metadata For Digital Images (DIG
35) initiative is to define a standard set of metadata that will improve
interoperability between devices, services, and software, thus making
it easier to process, organize, print, and exchange digital images. The
MPEG-7
(Moving Picture Experts Group) initiative targets audio-visual content
description and aims to standardize a set of description schemes and descriptors,
a language to specify description schemes, and a scheme for coding the
description. The
Interoperability of Data in E-Commerce Systems (<indecs>)
project is an international collaboration to develop a metadata framework
that supports network commerce of intellectual property.
Example
What kinds of metadata will be created for a journal collection that
is converted as 600 dpi, 1-bit TIFF 6.0 images? The following metadata
tasks might be undertaken. Each is identified by its principle metadata
type (S = structural, D=descriptive, A=administrative). Note:
The RLG
Model RFP provides an example of metadata requirements for a text
imaging project.
Assign
file names and directory structures to the image files and the
associated metadata files. (S)
Create or update MARC records (Fields 100, 110, 245, 260, 440,
650, etc.). (D)
Create Dublin Core records. (D)
Use MARC
Field 007 to record digital preservation and reformatting
information. (A)
Use appropriate TIFF 6.0 file headers to record technical information,
e.g., ImageWidth, ImageLength, Compression, StripOffsets, RowsPerStrip,
StripByteCounts, Xresolution, Yresolution, Resolution Unit; BitsPerSample.
(A)
Assign persistent, globally-unique, and location-independent file
names (PURL or Handle).
(D)
Use appropriate TIFF 6.0 file headers for image description (Field
270) to record descriptive elements essential for identifying
the file (e.g., project ID, institution, collection, year of publication,
title, author, image sequence number). (D)
Create a database to store and manage bibliographic information
from the cumulative journal indexes to enable structured vocabulary
search (e.g., journal volume, issue, title, author, beginning
and ending page number). (D, S)
Use TEI Lite SGML
encoding to map the basic structural elements of the journals,
such as volume, issue, title, author name, beginning and ending
pages for each article, to facilitate online searching and browsing.
(S)
OCR images to provide free-text key word access. (D)
Create HTML tags with Dublin Core information to facilitate resource
discovery. (D)
Register the Web site with relevant subject directories, specialized
subject portals, and gateways to increase coverage by Web search
engines. (D)
Example
2
What
kinds of metadata will be collected and recorded for a collection
of photographs?
In
addition to many of the elements suggested above, consider whether
to·
Enhance
an existing finding aid, and SGML-encode it using the EAD (Encoded
Archival Description) Document Type Definition to create a
map of the collection for searching and presentation. This will
facilitate interoperability with other EAD-encoded finding aids
(D, S, A)