Intro- digital imagesContentsSelection- introConversion-introQuality Control-definitionMetadata-definitionTechnical-storage typesPresentationDigital PreservationManagementContinuing Education
Technical-storageTechnical-storage needs

6C. Technical Infrastructure:
FILE MANAGEMENT

Key Concepts

introduction
keeping track
image databases
storage
storage types
storage needs

 


 

 

BASIC TYPES OF MASS STORAGE
Mass storage technologies can be classified in several ways. The underlying storage system (magnetic, optical or magneto-optical), the drive type (fixed or removable), the media material (tape, rigid platter, flexible platter), and the hardware interface (ATA, ATAPI, SCSI, USB, Firewire/IEEE 1394, Fibre Channel) jointly define the characteristics of each technology.

Storage systems are also distinguished as either direct attached storage or network attached storage. Direct attached storage includes standard desktop drives that are either installed within a computer case or cabled directly to it. Network attached storage generally encompasses storage that is accessible to multiple computers and may be either connected to a server and accessed via special file system protocols (e.g. Network File System or Common Internet File System), or part of a storage system that functions independently of any particular server (e.g. a SAN—Storage Area Network).

Storage hierarchies refer to the allocation of files to different kinds of storage depending on the frequency of use. When magnetic disk storage was very expensive, it was common to place the highest usage files on magnetic disk (online access), less frequently used files on less expensive (and slower) optical media (near line storage) and very infrequently accessed files on magnetic tape (offline storage). Due to the fact that magnetic disk storage has declined in price at a much more rapid pace than optical storage, the incentive to establish such hierarchies has lessened.

A table characterizing available technologies based on speed, capacity, and cost may be viewed by clicking on Table: Comparison of Storage Media

Trends in Mass Storage
Since the hard disk drive was invented in 1952, regular and rapid technological advancement has led to astonishing improvements in capacity, speed, reliability and price/performance ratio. The driving force in these improvements has been the unrelenting increase in the amount of data that can be stored in the same area (known as "areal density"). Unit cost for basic hard disk storage dropped by approximately a factor of 100 from 1997 to 2002. With predictions that the cost per unit of storage will continue to decline at a steep pace and that drive capacity will continue to increase, there is little likelihood that even the largest and fastest growing digital image collections will face capacity or affordability problems for mass storage. Other forms of mass storage, such as optical disk and magnetic tape systems, are also seeing improvements in price and performance, but at a lesser rate than seen with magnetic disk.

The downside of such rapid technological change is equally rapid obsolescence. The need to replace storage systems at short intervals (perhaps every 3-5 years) cancels out some of the cost benefits. Maintenance budgets for digital imaging systems should anticipate these needs.

Another downside is the confusing proliferation of new technologies. This is particularly true in two areas. One is hardware interfaces for magnetic disks. In order to take advantage of the increasing storage density (and consequent increases in the speed of data retrieval) new hardware interfaces must be developed that can keep up with the drives. Otherwise, there would be no advantage to the faster drives.

The result has been intense competition to increase the data flow rate the interfaces can handle, with each interfaces' stakeholders attempting to one-up the others and win a larger share of the market for high-performance applications. Examples include the move from USB 1.1 to 2.0, the regular introduction of new SCSI standards, and the impending shifts from IEEE 1394a to 1394b and parallel ATA to serial ATA. The new versions offer superior performance, but may cause problems such as incompatibilities (with earlier version devices and the computer system itself), lack of operating system support, and delayed availability of device drivers.

The other area where technology proliferation has caused confusion and headaches for users is formats for compact disk media. This is especially true for the high-density DVD formats, where at least five different formats compete, including three different rewritable formats (DVD-RAM, DVD+RW and DVD-RW). The lack of standardization leads to incompatibilities amongst drives and media and makes it risky for users to settle on any one technology. For more information on this topic, see the DVD FAQ.

A good discussion of many of the trends discussed above can be found here.

Reliability Considerations
Storage reliability takes on many different meanings at different points along the digitization chain. During capture, the concern centers on accurate recording of the bits and the maintenance of fidelity as the files go through various processing steps before arriving in a permanent storage archive. Once ready for delivery, short-term concern shifts to maintaining high availability of important files by minimizing storage system down time, and recovering rapidly from failures. In the long-term, reliability is focused on replacing storage systems before hardware and/or media fails, loses integrity or becomes obsolete.

Overall, the reliability of storage systems has been steadily improving. Almost all storage technologies now have some form of error correction built-in. As storage has become faster and higher in capacity, the extra time and redundant storage necessary to implement error correction has become less of a burden to implement. More and more disk drives have features such as S.M.A.R.T (Self-Monitoring, Analysis, and Reporting Technology) that allow a drive to constantly monitor its own performance and send out an alert if something is starting to go wrong (for example, if the drive's rotational speed is changing, perhaps indicating that a motor or bearing problem is developing).

Larger storage arrays are available with a variety of reliability features. RAID (Redundant Array of Independent or Inexpensive Disks) allows several performance and reliability related configuration options, such as data mirroring, so there is complete redundancy. Some systems can be configured with "hot spares" and "automatic failover" so in the event of a complete drive failure, the contents will automatically be reconstructed on a powered up spare, which then takes its place, all without human intervention. Others permit "hot swapping" of drives, so that replacements can be installed without powering down the entire storage system. As hard drive storage comes down further in price, it becomes less of a luxury to have empty drives spinning solely for the purpose of taking over in the event of a failure.

Unfortunately, these impressive features cannot be solely relied upon for protection of data. No technology is fail-safe, and entire storage installations can be destroyed by unpredictable events such as fires, floods, and earthquakes. For this reason, it is generally recommended that all unique data (especially master image files and all associated metadata) be stored on at least two kinds of media, in different physical locations. Often, the choice for secondary storage is removable media such as optical disks or magnetic tapes.

Most removable media can have reasonable life spans (claims vary from 10-100 years), though many of these figures are based on accelerated aging tests, not actual experience. However, improper storage conditions (e.g. high temperature and humidity) can dramatically lower media longevity. Some hard disk drive manufacturers are now claiming MTBFs (Mean Time Between Failure—a statistical measure of the likelihood of drive failure) of 100 years or more. How much attention should you pay to these numbers?

Given that all technologies are subject to failure, and new technologies are being introduced at ever-shrinking intervals, it is possible to get too caught up in concerns over the lifespan of digital storage media. Removable media drives are subject to rapid obsolescence (many formats have come and gone without ever achieving broad market acceptance). As discussed in Digital Preservation, long term survival requires a comprehensive plan that includes attention to media lifespan, storage environment, handling procedures, error detection, backup, disaster response, and monitoring for hardware, media and format obsolescence.

© 2000-2003 Cornell University Library/Research Department

 
Tecnical - storageTechnical - storage needs
Contents


View this page in Spanish
View this page in French