Monday, December 8, 2014

17.Digital Preservation Part - I

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com

17.Digital Preservation Part - I


P- 01. Digital Libraries*

By :Jagdish Arora, Paper Coordinator

Multiple Choice Questions

1 / 1 Points

Question 1: Multiple Choice

Bit-stream refers to:
  • Correct Answer Checked Transmission of sequence of bits
  • Wrong Answer Un-checked bits & bytes
  • Wrong Answer Un-checked Video streaming
  • Wrong Answer Un-checked Audio streaming
0 / 1 Points

Question 2: Multiple Choice

Digital image technology can be used to create a copy of:
  •  Un-checked High-quality of an original item
  • Wrong Answer Checked Low-quality of an original item
  • Wrong Answer Un-checked Same as an original item
  • Wrong Answer Un-checked None of the above
1 / 1 Points

Question 3: Multiple Choice

Digital preservation addresses the issue of:
  • Wrong Answer Un-checked Storage
  • Wrong Answer Un-checked Access
  • Wrong Answer Un-checked Preservation
  • Correct Answer Checked All of the above
1 / 1 Points

Question 4: Multiple Choice

Digital signature is mechanism used for:
  • Wrong Answer Un-checked IP Enabling
  • Wrong Answer Un-checked Information Retrieval
  • Correct Answer Checked Authentication
  • Wrong Answer Un-checked Searching
1 / 1 Points

Question 5: Multiple Choice

JPEG 2000 is standard used for:
  • Wrong Answer Un-checked Image Compression
  • Wrong Answer Un-checked Image Editing
  • Wrong Answer Un-checked Image Enhancement
  • Correct Answer Checked Image Format
0 / 1 Points

Question 6: Multiple Choice

Storage media that is considered more reliable for preservation is:
  • Wrong Answer Checked Optical Disc
  • Wrong Answer Un-checked Magnetic Disc
  • Wrong Answer Un-checked DAT
  •  Un-checked Microfiche
1 / 1 Points

Question 7: Multiple Choice

Technological Obsolescence refers to:
  • Wrong Answer Un-checked Obsolescence of Hardware
  • Wrong Answer Un-checked Obsolescence of Software
  • Wrong Answer Un-checked Obsolescence of File Format
  • Correct Answer Checked All of the above
1 / 1 Points

Question 8: Multiple Choice

The term “analogue back-up” in context of digital preservation means:
  • Correct Answer Checked Taking a microfilm or high-quality print
  • Wrong Answer Un-checked Bit-stream copying
  • Wrong Answer Un-checked Data Abstraction
  • Wrong Answer Un-checked Canonicalization
1 / 1 Points

Question 9: Multiple Choice

The term “de facto standard” refers to:
  • Correct Answer Checked Standard used most widely or by default
  • Wrong Answer Un-checked Non-proprietary standard
  • Wrong Answer Un-checked ISO Standard
  • Wrong Answer Un-checked WIPO Standard
0 / 1 Points

Question 10: Multiple Choice

The term “software re-engineering” in context of digital preservation means:
  • Wrong Answer Un-checked Buying newer version of the software
  • Wrong Answer Checked Making source code available in public domain under GNU license
  •  Un-checked Re-writing and re-compilation of source code
  • Wrong Answer Un-checked Embedding software in EPROM
0 / 1 Points

Question 11: Multiple Choice

Universal Virtual Computer is a form of:
  • Wrong Answer Un-checked Encapsulation
  •  Un-checked Emulation
  • Wrong Answer Checked Normalization
  • Wrong Answer Un-checked Replication
1 / 1 Points

Question 12: Multiple Choice

Which one of the following has the highest storage capacity:
  • Wrong Answer Un-checked Microfiche
  • Wrong Answer Un-checked Magnetic Harddisc
  • Wrong Answer Un-checked CD ROM
  • Correct Answer Checked DVD ROM
0 / 1 Points

Question 13: Multiple Choice

Which one of the following is a “Investment Strategy” for digital preservation:
  • Wrong Answer Un-checked Migration
  •  Un-checked Encapsulation
  • Wrong Answer Checked Emulation
  • Wrong Answer Un-checked Replication
0 / 1 Points

Question 14: Multiple Choice

Which one of the following is a “Medium to Long-term Strategy” for digital preservation:
  • Wrong Answer Un-checked Refreshing
  • Wrong Answer Un-checked Software Re-engineering
  •  Un-checked Emulation
  • Wrong Answer Checked Bit-stream copying
0 / 1 Points

Question 15: Multiple Choice

Which one of the following is not a digital preservation strategy?
  • Wrong Answer Un-checked Replication
  • Wrong Answer Un-checked Bit-stream Copying
  • Wrong Answer Checked Refreshing
  •  Un-checked Digital Watermarking
0 / 1 Points

Question 16: Multiple Choice

Which one of the following is not a short-term preservation strategy:
  • Wrong Answer Checked Bit-stream copying
  • Wrong Answer Un-checked Refreshing
  •  Un-checked Migration
  • Wrong Answer Un-checked Technology Preservation
1 / 1 Points

Question 17: Multiple Choice

Which one of the following media is likely to last longest:
  • Wrong Answer Un-checked Paper
  • Wrong Answer Un-checked Microfiche
  • Wrong Answer Un-checked CD ROM
  • Correct Answer Checked Clay Tablet
0 / 1 Points

Question 18: Multiple Choice

Which one the following is not a proprietary file formats?
  • Wrong Answer Un-checked DOC
  • Wrong Answer Checked GIF
  • Wrong Answer Un-checked JPEG
  •  Un-checked TIFF
0 / 1 Points

Question 19: Multiple Choice

Which process is used for making an exact duplicate of a digital object?
  •  Un-checked Bit-stream copying
  • Wrong Answer Un-checked Refreshing
  • Wrong Answer Un-checked Migration
  • Wrong Answer Checked Analogue backups
1 / 1 Points

Question 20: Multiple Choice

“Technology Preservation” is also referred to as:
  • Wrong Answer Un-checked Version Migration
  • Wrong Answer Un-checked Backward Compatibility
  • Correct Answer Checked Computer Museum
  • Wrong Answer Un-checked Digital Archaeology
10 / 20 PointsFinal Score:
..................................................................................................................................................................

0. Objectives

The objectives of this module is to impart knowledge on the following aspects of digital preservation: 

  • Need, relevance, problems and challenges of digital preservation;

  • Principles that guide digital preservation actions;

  • Factors that are involved in long-term digital preservation;

  • Digital preservation strategies; and

  • Impact of intellectual property rights and digital rights management on digital preservation.

1.0 Introduction

The widespread transition of knowledge from print to electronic format began in 1980s with appearance of 5¼ inch and 3½ inch floppy discs accompanied with documents acquired by libraries. These floppy discs and floppy drives required to read them have disappeared completely. However, information recorded on them is still relevant to libraries and its users. Likewise, CD ROM, DVD ROM and magnetic tape cartridges that are used as low-cost storage media in libraries may also move towards extinction as storage technology evolves. Acquisition of electronic content in libraries are increasing everyday with addition of new resources that are “born digital” through different channels of communication like e-journals, e-books and online bibliographic databases subscribed by libraries from publishers, vendors and aggregators. Furthermore, individual institutions themselves are producing their research output and other knowledge resources in electronic format. Libraries, with their mandate to provide long-term access to resources available with them are concerned that if the media and the technology used for preserving digital content becomes obsolete, libraries may fail to provide access to its digital data that are preserved for posterity. As such, the issues of preservation of digital contents are a matter of concern with technologies, standards and formats in continuous flux of change. In the past few years, significant developments have been made in digital preservation with several new projects and programmes involving national and international institutions of high repute. Libraries, archives, and other cultural institutions are eagerly looking forward to adapt and adopt tenets of digital preservation with an aim to avoid risk of loss of digital contents due to rapid changes in technology. Preservation of digital data requires substantial new investments and commitments by organizations, institutions and agencies to adopt its economic and administrative policies for funding and managing the digital preservation practice.

2.0 Definitions

Digital preservation deals with management of digital information over long period of time. Digital preservation is a set of processes and activities that ensure continued long-term access to information from all kinds of records, both scientific and cultural heritage that exists in digital form.  According to Trusted Digital Repositories (TDR, 2002) “digital preservation encompasses a broad range of activities designed to extend the usable life of machine readable computer files and protect them from media failure, physical loss and old fashioned”. Kelly (1999) defines digital preservation as “storage, maintenance, and accessibility of digital object (include any digital material such as a text document, an image file, a multimedia CD-ROM or a database) over long-term, usually as a consequence of applying one or more digital preservation strategies”. Digital preservation is the active management of digital content over time to ensure ongoing access.

Kirchhoff (2008) defines digital preservation as “series of management policies and activities necessary to ensure the enduring usability, authenticity, discoverability, and accessibility of content over the very long-term”. Digital preservation refers to a series of managed activities designed to ensure continuing access to all kinds of records in digital formats for as long as necessary and to protect them from media failure, physical loss and obsolescence (Cornell University Library, 2005).

The Wikipedia (2014) defines “digital preservation” as "the series of managed activities necessary to ensure continued access to digital information for as long as necessary". Digital preservation involves the planning, resource allocation, and application of preservation methods and technologies to ensure that digital information of continuing value remains accessible and usable. It combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change.


3.0 Needs of Digital Preservation

Libraries should have a clear understanding about its purpose for digitizing and preserving digital material. Fundamental needs for digital preservation include:

  • Exponential growth in digital information available in libraries and its ephemeral nature;
  • Increased complexity of digital objects (incorporating text, images, audio, video, GIS, formats, etc.) and their increasing dependency on software required to read and use them;  
  • Rapid flux of technology, standards and formats;
  • Multiplicity of standards and formats;
  • Absence of widely-accepted standards that will assure access overtime;
  • Need to ensure usability, durability and intellectual integrity of the digital information; and
  • Rapid changes and obsolescence of storage media (e.g., Limited life span of storage media).  

4. Problems and Challenges of Digital Preservation

The challenges in maintaining access to digital resources over time are related to notable differences between digital and paper-based material. The initial problem with digital preservation is the contents itself (Chen, 2001). Digital contents are complex and dynamic in nature. It requires specific software and up-to-date technologies to access these contents frequently. The economic challenges of digital preservation are also enormous. Preservation programmes require significant upfront investment to create, along with ongoing costs for data ingest, data management, data storage, and staffing. Graham (1998) grouped problems and challenges of digital preservation into three distinct categories, namely:  i) Longevity of Physical Storage Media; ii) Technology-related Issues including Technological Obsolescence, Hardware and Software Dependence and Multitude of Formats; and iii) Intellectual Preservation Issues including integrity and authenticity of information. Specific challenges that need to be addressed while preserving digital contents are as follows:

4.1 Dynamic Nature of Digital Contents

Preservation in analogue world involves static objects like printed documents, manuscripts and other artifacts. Collecting and storing these items in some form is simple and straightforward process. While preserving digital contents requires reconsideration in terms of meaning and purpose of preservation. Digital information exists in several forms and type. There are several digital documents that are true replica of their print counterpart, such as books, reports, correspondences, etc. However, there are other types of digital material that vary greatly from their tradition forms. For example: interactive Web pages, geographic information systems and virtual reality models. Web sites are often dynamically changing sites. As the object grows and changes over time, new questions emerge about what it means to preserve a digital object. Internet users are all familiar with the link failure syndrome that plagues the Web. 

4.2 Machine Dependency

Digital contents are machine-dependent. It may not be possible to access the information unless there is appropriate hardware, and associated software. Access to digital contents may require specific hardware and software that were used for creating them. Since computer and storage technologies are in a continuous flux of change, the timeframe available for migrating digital contents to new software / hardware is generally very short, typically 3 to 5 years, as opposed to decades or even centuries that may be available for preserving traditional materials. Techno-obsolescence is considered as the greatest technical threat to ensuring continued access to digital contents.

4.3 Fragility of the Media

The storage media used for storing digital contents are inherently unstable and highly fragile because of problems inherent to magnetic and optical media that deteriorate rapidly and can fail suddenly because of exposure to heat, humidity, airborne contaminants, or faulty reading and writing devices (Hedstrom and Montgomery, 1998). Magnetic storage media is highly sensitive to dust, heat, humidity and other climatic conditions. Most storage devices, without suitable storage conditions and proper management, may deteriorate very quickly without displaying any physical characteristics of external damage. Deterioration of storage media may lead to corrupted digital files in such a fashion that it may not be easy to identify the corrupted portion of digital contents. 


4.4 Technological Obsolescence

Technological obsolescence can affect hardware, software and file format. Not only computers are continuingly superseded with their faster and more powerful versions, the media used to store digital contents also become obsolete in two to three years before they are replaced by newer and denser versions of that medium, or by new types of media that is smaller, denser, faster, and easier to read. The digital materials stored on older media could be lost because the hardware or software to read them may become obsolete. Although the media may physically survive for years, the technology to read and interpret it may exist for only a brief period of time. As a result, even if the storage media is retained in the best condition, it may still not be possible to access the information it contains. Obsolescence also affects software that is used to create, manage, or access digital contents since the software are being superseded by newer versions or newer generations with more capabilities. There is a constant threat of backward compatibility for digital contents that were created using older versions of software. Similarly, file formats are being superseded with newer versions, and the newer versions of software may not read files in older formats. Although some file formats are largely independent of specific software (for example ASCII and Unicode), most are tied to individual or related groups of software. Proprietary software with associated file formats represents some of the most enduring and successful software in use. Commercial software developers regularly release new versions of their software and associated file formats with added features and functionality in order to entice users to upgrade. 

4.5 Shorter Life Span of Digital Media


One of the important concerns of digital preservation is relatively short life span of digital media and higher rate of obsolescence of the hardware and software used for accessing the digital records. Rapid change in the IT industry and the move from science-based development to commercial development of software and hardware systems has resulted into media becoming inaccessible at a faster pace.


4.6 Formats and Styles

Information contents that were earlier confined to traditional formats like books, maps, photographs, and sound recordings are getting increasingly available in diversity of digital formats. New formats have emerged, such as hypertext, multimedia, dynamic pages, geographic information systems and interactive video. Each format or style poses distinct challenges relating to its encoding and compression for digital preservation.

4.7 Copyright and Intellectual Property Rights (IPR) Issues

Legal issues, in particular the process of obtaining copyright clearance for preservation and access of archived material, can contribute significantly to the cost and complexity of digital preservation. It is an area where the wider preservation community often needs to make its case with government and other legislators.

Andrew Charlesworth (2012) emphasized that while a number of legal issues colour contemporary approaches to, and practices of, digital preservation, it is arguable that intellectual property law, represented principally by copyright and its related rights, has been by far the most dominant, and often intractable, influence.  It is essential for those engaging in digital preservation to understand the letter of the law and to be able to identify and implement practical and pragmatic strategies for handling legal risks in the pursuit of preservation objectives. Moreover, those engaging in digital preservation need to advance a coherent and cogent message to rights holders, policymakers and the public with regard to the relationship between intellectual property law and digital preservation.  It is in the long-term interests of all stakeholders that modern intellectual property law permits both the implementation of effective and efficient mechanisms of digital preservation.

5. Principles of Preservation as Applied to Digital Preservation

The basic principles of preservation that are being practiced for preservation of analogue media are also applicable to preservation in the digital world. In essence, digital preservation defines priorities for extending the life of digital information resources. Convey (Convey, 1996) identified five principles, i.e. longevity, choice, quality, integrity, and accessibility that are being practiced for preservation of analogue media and can be extended to digital preservation.

The following principles guides digital preservation actions:

5.1     Longevity: Density of media to record information has increased exponentially over time while its longevity to store the information has decreased proportionately. The Figure 1 given below (Convey, 1996) plots ten “writing” media on “X” axis in chronological order with their corresponding capacity to write information on “Y” axis on a logarithmic scale. It can be observed that the capacity to write information increases at each level by a factor of ten. The longevity of digital contents dependents on the life expectancy of the access system, including hardware and software. Storage media is likely to have longer life span in comparison to computer systems that is used to retrieve and interpret the data stored on them. The libraries must always be prepared to migrate valuable digital contents, indexes, and software to future generations of the computer and storage devices. Migration of digital contents would remain a continuing activity to ensuring perpetual availability of digital information. The libraries must ensure continuing institutional commitment to support long-term migration strategies.
       Fig. 1:  Information Density V/s Life Expectancy of Storage Media

5.2     Accessibility: Digital preservation activities must be performed with collaborative understanding that long-term access is the primary goal. Access to digital collections should be supported to the best of ability of available technology and resources. Acquisition of non-proprietary hardware and software components can ensure perpetual access to digital resources.

5.3     Selection: Selection of digital material for preservation is an ongoing process intimately connected to the active use of the digital files. The process of selection and value judgment is involved every time a decision is to be made to convert documents from paper or digital image and migrate it from one storage media and access system to another so as to continue preserving the information. Rare collection of digital files can only justify the cost of a comprehensive migration strategy. (Conway, 1996).

Selection of digital contents for preservation should reflect the broader institutional mission. Moreover, as with analogue documents, the main criteria in the selection of digital contents for preservation should be their authenticity, significance and lasting cultural value in reflecting subject matter.

5.4     Quality: Quality in the digital world is concerned with usefulness and usability of digital contents, and is essentially govern by the limitations of capture and display technology. Imaging technology, for example, facilitates scanning at resolution of 3000 dpi, however, the printing and display technology has its limitation. Quality of the digital object, including the richness of both the image and the associated indexes, is the heart and soul of preservation in the digital world. This means maximizing the amount of data captured in the digital scanning process, documenting image enhancement techniques, and specifying file compression routines that do not result in the loss of data during telecommunication. (Convey, 1996)

5.5     Integrity and Authenticity: Digital preservation is concerned with physical as well as intellectual integrity of digital contents. In terms of digital preservation, the physical integrity of a digital image file is determined in terms of loss of information that occurs when a file is created in the process of scanning, and compressed mathematically for storage or transmission across the networks. The metadata (descriptive or structural) that describes intellectual contents of an image file or its organization is an integral part of the digital file, which must be preserved along with the digital image files themselves. The preservation of intellectual integrity also involves authentication procedures to make sure files are not altered intentionally or accidentally (Lynch, 1994).

5.6     Discoverability: Digital content must have associated bibliographic metadata so that the content can be found by end-users through time.

5.7     Usability: The intellectual content of the item should remain usable via the delivery mechanism of current technology.

5.8     Sustainability: Digital preservation activities must be planned and implemented in ways that resource can be managed and sustained into the future. Future access to digital resources cannot be assured without institutional commitment to necessary resources that are required for digital preservation.


6. Factors of Digital Preservation

There are many issues involved in long-term digital preservation. These factors can be grouped into the following six categories, each one them tends to affect one another:

6.1       Cultural Factors: There is a lack of awareness amongst large groups of people within society including planners and decision makers about the historical value and significance of their digital documentary heritage. This, in turn, leads to obliviousness to perform adequate and proper keeping of those documents with a consequent loss of heritage. Although digital information production is considered valuable, there is not enough awareness about its preservation. In 2003, a US survey carried out by the Cornell University Library found that the main menace to digital materials was the lack of policies and plans inside their institutions to carry out this task. In developing countries, the situation could be worse.

6.2       Technological Factors: Technological factors are mainly related to obsolescence of computers, storage devices and media, changes in operating systems, formats, programs, interfaces, reading and reproducing devices, emerging standards, lack of interoperability among computing devices. Moreover, issues related to information security must also be addressed. This has to do with the relationship among threats, risks, vulnerabilities, impacts, and control measures on digital objects. Libraries, with limited technologies and technological dependence factors, should gear up to cope with these obsolescence and security problems.  

6.3       Legal Factors: Preservation of digital contents is not an easy task for libraries. It is strongly associated with legal factors such as copyrights and IPR. Some of the questions that need to be answered include: Who is legally responsible for keeping every document collection or archive for the future? Who has the legally eligible or competence to perform that task? Will it be possible to make these documents accessible in future? National libraries and archives are currently trying to balance their responsibilities of receiving, keeping and providing access to documents and the growing restrictions on distributing them, mostly in electronic formats.

6.4       Methodological Factors: These factors are associated with the tools and standards that are used for appraisal among the different materials, selection and disposal, logical storing and future retrieval of documents. The digital document with a simple set of descriptive metadata like author, title and keywords are not enough for proper future retrieval of digital documents. A new set of metadata that allow hyperlinks and contextualize description of document in relation with other documents enhancing its reuse, search, linking, weighting, integration, data mining and interoperability with other programs that might be used in future. If these factors are not taken into consideration then the technological preservation effort will be of limited use inspite of complexity involved and cost.

6.5       Economic Factors: Preservation is an on-going process. Therefore, current, short, and long term costs and funding are important issues to deal with before and during a preservation project, in order to maintain their feasibility in the long term. These includes: cost of digitizing (cost of scanning and/or producing a digital original), cost of editing (to prepare, assemble, alter, adapt, refine or bring about conformity to a standard certain digital document), cost of register (to add set of metadata pertinent to the digital object), cost of storing (cost to maintain in storage devices in or off-line a digital object for a given time) and cost of updating (cost to copy, update, refresh, convert, and reshape digital documents to fulfill new requirements).

6.6     Social Factors: These factors are associated with usability, accessibility and security aspect of digital preservation. The future generations should have effective and efficient access to the information that are preserved. There is no use in preserving digital documents if no one or just a few users will have access to preserved documents. Assuming copyrights, privacy rights, and other legal issues are observed, the future challenge will be how to make this information available to as many people as possible through several generations. The social issues should be addressed and taken into consideration while defining digital preservation policy.

                                        Figure 2: Factors of Digital Preservation




7. Digital Preservation Strategies

The goal of digital preservation strategy is to achieve consistency in the management of digital records. The purpose is to ensure that access to digital archives can be maintained indefinitely. The preserved digital objects should be identical in all essential respects to the original digital objects. It is important to understand what is 'essential' in order to protect those aspects of a digital record and to measure the success of preservation interventions. UNESCO’s Guidelines for the Preservation of Digital Heritage (2003) group these strategies under the following four categories: 

7.1 Short-term Strategies

Short-term digital preservation strategies are likely to work for a short period of time only. These strategies include:

7.1.1    Bit-stream Copying

Bit stream copying is referred as “backing up data”, or “mirror image backup”, which involves the backup of all areas of a computer hard disk drive or another type of storage media making an exact duplicate of a digital object. Bit stream copying is not a long-term maintenance technique, since it deals only with the question of data loss due to hardware and media failure, whether resulting from normal malfunction and decay, malicious destruction or natural disaster. It should be considered the minimum maintenance strategy for even the most lightly valued, short-lived data.

7.1.2   Refreshing

Refreshing is the transfer of data between two types of the same storage medium, with no change, whatsoever, in the bit-stream. For example, transferring census data from an old preservation CD to a new one. This strategy may need to be combined with migration when the software or hardware required to read the data is no longer available or is unable to understand the format of the data. Refreshing is a necessary component of any successful digital preservation project. It potentially addresses both decay and obsolescence issues related to the storage media.

7.1.3          Replication

Replication is a method of creating duplicate copies of data on one or more systems. Data that exists as a single copy in only one location is highly at risk to software or hardware failure, intentional or accidental alteration, and environmental disaster like fire, flooding, etc. Digital data is more likely to survive if it is replicated in several locations. Thus, the intention of replication is to enhance the longevity of digital documents while maintaining their authenticity and integrity through copying and the use of multiple storage locations.

Bit-stream copying is a form of replication. LOCKSS (Lots of Copies Keeps Stuff Safe) is a consortial form of replication, while peer-to-peer data trading is an open, free-market form of replication. CLOCKSS supports the traditional model of preservation whereby individual libraries build and maintain local collections of journals.

7.1.4          Technology Preservation or Computer Museum

Technology preservation is the maintenance of the hardware and software platforms, which support a digital resource, if adopted as a preservation strategy. It needs regular cycle of media refreshing. Maintaining obsolete technology in usable form requires a considerable investment in equipment and personnel. It is also called the “computer museum” solution.


                                                Figure 3: Short-term Strategies 


7.2.1 Migration

Migration is the process of transferring digital information from one hardware and software setting to another or from one computer generation to subsequent generations, without change in their intellectual content. The purpose of migration is to preserve the integrity of digital objects and to retain the ability for clients to retrieve, display, and use them in the face of constantly changing technology. For example, moving files from an HP-based system to a SUN-based system involves accommodating the difference in the two operating environments. Migration can also be format-based, to move image files from an obsolete file format or to increase their functionality. 


7.2.2 Canonicalization

Canonicalization can be defined as a canonical form for a class of digital objects that, to some extent, captures the essential characteristics of that type of object in a highly determined fashion, when it is converted from one format to another. This form could be used to algorithmically verify that a converted file has not lost any of its essence. In particular, it provides a language or framework for understanding the effects of file translation. Unlike text, there are many ways in which an image can be stored: by rows or by columns; in planes; compressed or uncompressed. Different formats in use today make different choices. All of them can be mapped to the relevant canonical format. As long as the canonical form is used, integrity and authenticity can be managed independently of the peculiarity of specific representations and their choices about how to store the image. Clifford Lynch (1999) is recognized as the first person to introduce the idea of canonicalization.



7.2.3 Emulation

Emulation is a method of preservation that can preserve the functionality and the 'look and feel' of digital objects that migration may not be able to achieve. This method attempts to simplify digital preservation by eliminating the need to keep old hardware working. Emulation combines software and hardware to reproduce in all essential characteristics the performance of another computer of a different design, allowing programs or media designed for a particular environment to operate in a different, usually newer environment. Emulation requires creation of emulator programs that translate code and instructions from one computing environment so it can be properly executed in another. It is cost effective solution in certain circumstances for the reason that producing one emulator could be much cheaper than migrating every digital object in an archive.       

Intellectual Property Rights (IPR) issues are also involved in emulating either operating systems or applications. There is a need of trusted organization that can undertake the work and make this available for others to use for effective emulation.

                                         Figure 4: Issues in Emulation

7.3 Investment Strategies

Investment preservation strategies involve investment of efforts at the time of archiving digital materials. Such strategies include:  

7.3.1 Restricting Range of Formats and Standards

Preservation programmes may decide to only store data in a limited range of formats and standards. This can be achieved either by only accepting material in specified formats or by converting material from other formats before storage. All digital objects within an archival repository of a particular type (e.g., colour images, structured text) can be converted into a single chosen file format that is thought to embody the best overall compromise amongst characteristics such as functionality, longevity, and preservability. For, example most of the textual and graphical information can be converted into PDF format. The UK Archaeology Data Service (ADS), for example, specifies a preferred (but not exclusive) range of formats for deposit and provides guidelines for depositors on creating or preparing materials for submission.

The strategy does not necessarily solve the access problem unless the obsolescence of formats and standards used are handled effective through some other strategy. This strategy imposes serious restrictions on the range of materials that a preservation programme can accept. Moreover, the process of conversion from original format may cause some loss of essential elements.

7.3.2 Reliance on Standards

Reliance on Standards seeks a way to "harden" the encoding and formatting of digital objects by adhering to well-recognized standards and favouring such standards over more impenetrable and less well-supported ones. It is to software what durable media is to hardware. This preservation strategy involves use of open, widely available and supported standards and file formats that are likely to stable for a longer period of time discarding proprietary or less-supported standards. For example, if JPEG2000 becomes a widely adopted standard, the sheer volume of users will guarantee that software to encode, decode, and render JPEG2000 images will be upgraded to meet the demands of new operating systems, CPUs, etc. similarly, majority of digitization programmes choose TIFF (Tagged Image File Format) as an open, stable and widely supported standard for creation of preservation master images and also most publisher use PDF as de facto standard for electronic distribution of their research articles, due to the availability of PDF readers for all platforms. Like many of the strategies described here, reliance on standards may lessen the immediate threat to a digital document from obsolescence.

7.3.3 Data Abstraction and Structuring

Data abstraction involves analyzing and tagging data so that the functions, relationships and structure of specific elements can be described. Using data abstraction, the representation of content can be liberated from specific software applications and be achieved using different applications as technology changes. The technique requires extensive development of tools and methods for analysis and processing in order to correctly represent and tag each type of data, thus making a document application-independence and simplifies the transport of data between platforms and over generations of technology. 


7.3.4 Encapsulation

Encapsulation involves retaining a digital object in its original form as a bit stream, and encapsulating it along with instructions and whatever else might be necessary to maintain access to it in the future. Encapsulation is considered a key element of emulation. First, the information that has to be encapsulated comprises the document and its software environment. Central to the encapsulations is the digital document itself, consisting of one or more files representing the original bit stream of the document as it was stored and accessed by its original software. In addition, the encapsulation contains the original software for the document, itself stored as one or more files representing the original executable bit stream of the application program that created or displayed the document. A third set of files represents the bit streams of the operating system and any other software or data files comprising the software environment in which the document’s original application software ran. Rothenberg (1995) provides a diagram which shows how much needs to be encapsulated:
                                     Figure 5: Encapsulation

Open Archival Information System (OAIS) Model represents a form of encapsulation, in which the digital object is packaged together with the Representation Information needed to interpret the bits appropriately for access; and Preservation Description Information, which includes information on provenance, context, reference and fixity.

     Figure 6: Open Archival Information System (OAIS) Model

7.3.5 Software Re-engineering

The function of application software associated with Digital preservation process gets most affected by changes in technology during regular migration.  However, software reengineering may offer a number of strategies for transforming software and data formats. Some possibilities include: Adjustment and re-compiling of source code for a new platform: it requires considerable time and effort by the compilers or interpreters to adjust the existing code or re-coding in another programming language or reverse-engineering of compiled code into higher level code and porting that to the new platform or translation of compiled binary instructions for one platform directly into binary instructions for another platform. 


7.3.6 Universal Virtual Computer


A Universal Virtual Computer (UVC) is a virtual machine (VM) specially designed for preservation of digital objects, based on emulation. This method allows digital objects to be reconstructed in its original appearance anytime in the future and is completely independent of the architecture of the computer on which it runs. Users could create and save digital files using the application software of their choice, but all files would also be backed up in a way that could be read by the universal computer. The central idea of the UVC-based preservation method is based on the following four different components:

 i)     Universal Virtual Computer;
 ii)     UVC program (format decoder);
 iii)     Logical Data Schema (LDS) with information type description; and
 iv)     Logical Data Viewer.

A UVC program decodes the file format of a digital object. This format decoder program runs on the UVC, which is the platform-independent layer, independent of future hardware and software changes. Executing the format decoder delivers element tags, which hold specific information about the content of the data in a technology-independent manner. These elements build the Logical Data View (LDV) of the data, which is quite similar to XML. The LDV is a visible representation of the LDS, describing the structure and meaning of the tags as parts of a specific information type.

All these components are controlled by a Logical Data Viewer simply called viewer (Figure 7). For reconstruction, the viewer starts the UVC and feeds it with the data of the digital object to a format decoder running on top of the UVC. In return, it retrieves an LDV and reconstructs a specific representation of the original object’s meaning.

                                        Figure 7: UVC-based preservation method 


7.4 Alternative strategies


Alternative strategies to digital preservation include taking analogue backup of document (print or microfilm) or recovering data from obsolete digital media.

7.4.1 Analogue Backups

Analogue backups are a method of conversion of digital objects into analogue form e.g., taking high-quality printouts or the creation of silver halide microfilm from digital images. An analogue copy of a digital object can, in some respects, preserve its content and protect it from obsolescence, without sacrificing any digital qualities. Text and monochromatic still images are the most amenable to this kind of transfer.

The limitations of analogue backups and their relevance to only certain classes of documents are highly expensive, the technique only makes sense for documents whose contents merit the highest level of redundancy and protection from loss.

7.4.2 Digital Archaeology or Data Recovery

Digital archaeology involves retrieving data from obsolete software or hardware environments, or the wealth of other removable media which have been used since the earliest days of computing. There are a growing number of specialist third party services offering to carry out digital archaeology, and it has been shown to be technically possible to recover bit streams from damaged and obsolete media. Only trained specialists will be able to extract data in this way, using special hardware and software; for instance, in order to extract data from relatively recent, damaged, media, the British Library makes use of 'forensic' hardware, designed for use by law enforcement, intelligence, corporate and military agents who need to recover digital evidence from hardware in a way which ensures its authenticity.

Digital archaeology is an emergency recovery strategy, not a pro-active and preventative approach to long-term preservation, because:
  • It is much more costly than the other major preservation strategies and is unlikely to be cost-effective for any other than the most highly valued digital resources;

  • Relying on digital archaeology means that the digital material that is not necessarily highly valued (yet might still be useful to some researchers or have important evidential value) might not be rescued;

  • If there is no accompanying metadata or documentation, it may be impossible to assess the value or usefulness of obsolete digital resources until after rescue has taken place, which may turn out to be a waste of resources;

  • Digital archaeology techniques are unlikely to be successful in all cases; and

  • It requires a certain amount of technology preservation (see above).

7.5 Combinations


Even with good planning, a single preservation strategy may fail leaving the programme with no means of access. Several digital preservation projects may be used to cover the range of objects and characteristics to be preserved.
For example:

  • Standards such as TIFF for image collections are often chosen in preparation for eventual migration to other standard formats over the long-term;

  • The VERS strategy couples the use of standards (PDF, XML) to the future use of viewers and the likely migration of XML encoded metadata in the future;

  • Persistent archives (Moore, 2000) use data abstraction with the view to eventual migration – migration of the data, the mark up system and the supporting software, and upgrading of hardware;

  • The Universal Virtual Computer (UVC) approach combines data abstraction with rules for migration of data objects at the point of access, and an emulation approach for software objects. The “durable encoding” approach adds the use of fundamental standards for encoding data, including encoding that could be understood by the UVC.

8.1 Copyright and Other Intellectual Property Rights (IPR)

Content owners have copyright on the content that has substantial impact on digital preservation. The IPR issues for digital materials are more complex and significant than for traditional media. If these issues are not addressed, it can hinder or even prevent preservation activities. Simply copying (refreshing) digital materials onto another medium, encapsulating content and software for emulation, or migrating content to new hardware and software, involve activities that can result in infringement of IPR unless statutory exemptions exist or specific permissions have been obtained from rights holders. As both migration and emulation will involve manipulation and changing presentation and functionality to some degree, it is important to establish a dialogue with rights holders so that they are fully aware of these issues and the actions and rights required to ensure the preservation of selected items are obtained from copyright holders. 



8.2 Access and Security

Some of the additional complexity in IPR issues relates to the fact that electronic materials are also easily copied and re-distributed. Rights holders are, therefore, particularly concerned with controlling access and potential infringements of copyright. Technology developed to address these concerns and provide copyright measures can also inhibit or prevent actions needed for preservation. These concerns over access and infringement and preservation need to be understood by organizations preserving digital materials and addressed by both parties in negotiating rights and procedures for preservation.

8.3 Stakeholders, Contract & Grant Conditions, and Moral Rights

Resources in electronic formats are the result of substantial investment by funding agencies, publishers, individual scholars and authors. Each of these stakeholders may have an interest in preservation. Archiving organization are required to seek permissions from them to safeguard and maximize the financial investment, intellectual and cultural value of the work for future generations. Such interests may be manifested through contract, license, and grant conditions or through statutory provision such as "moral rights" for the authors. 



8.4 Privacy and Confidentiality

Digital objects are subject to confidentiality agreements like Data Protection Act or similar privacy legislation that protects information held on individual. Privacy and confidentiality concerns may impact on how digital materials can be managed within the repository or by third parties, and made accessible for use. 


8.5 Business Models and Licensing

Business models for dissemination of electronic materials and the range of stakeholders who own the IPR has impact on digital preservation. In most cases, subscribers to electronic resources, particularly electronic journals, do not have its physical possession. Subscribers are, therefore, concerned that publishers consider the archiving and preservation of these works and include archiving and perpetual access to back issues in licensing of these works.

8.6 Legal Deposit

Legal deposit libraries are obviously own major responsibility for digital preservation for documents deposited with them. In UK, the Legal Deposit Libraries Act 2003 (United Kingdom, 2003) is enabling legislation, which will be implemented over time by a series of further Regulations. UK legal deposit law should, over time, be extended to cover digital publications as well as their preservations. Unusually, this law also includes provisions to allow legal deposit libraries to carry out activities necessary to acquire, preserve and make accessible digital publications. Other countries are increasingly extending their legislation. Initially, new laws tended to cover only tangible digital publications (for example, magnetic tape, diskettes and optical discs) or so-called "static" online publications.

 

                                 Figure 8: Rights Management


9. Summary

The module introduces challenges and problems of digital preservation with technologies, standards and formats in continuous flux of change. The module defines digital preservation, its scope, need and processes involved that ensure long-term accessibility and usability of digital information. The module elaborates on problems and challenges of digital preservation that can be grouped into three distinct categories, namely:  i) Longevity of Physical Storage Media; ii) Technology-related Issues including Technological Obsolescence, Hardware and Software Dependence and Multitude of Formats; and iii) Intellectual Preservation Issues including integrity and authenticity of information. Principles that guide digital preservation actions with ultimate goal of providing long-term access to digital content are described briefly in this module. Module elaborates on factors that are involved in long-term digital preservation. These factors can be grouped into six categories, namely i) Cultural factor, ii) Technological factor; iii) Legal Factor; iv) Methodological Factors; v) Economic Factors; and vi) Social Factors.

The goal of digital preservation strategy is to achieve consistency in the management of digital records so as to ensure long-term access to digital archives. These strategies can be grouped under four categories, namely i) Short-term strategies; ii) Medium to long-term strategies; iii) investment strategies; and iv) Alternative strategies. The module also discusses about use of combinations of digital preservation strategies so as to cover a range of objects and characteristics to be preserved. Lastly, the module discusses impact of intellectual property rights and digital rights management on digital preservation. 

Reference

Arora, Jagdish (2004). Building digital libraries: An overview. DESIDOC Bulletin of Information Technology, 21(6).

Ayre, Catherine , Muir, Adrienne(2004). The Right to Preserve: The Rights Issues of Digital Preservation. D-Lib Magazine, 10(3). Available atwww.dlib.org/dlib/march04/ayre/03ayre.html

Charlesworth, A. (2012). Intellectual Property Rights for Digital Preservation. Digital Preservation Coalition Technology Watch Report, 12-02.

Chen, S. S. (2001). The paradox of digital preservation. Computer34(3), 24-28.

Conway, Paul (1997). Preservation in digital world. Microform and Imaging Review, 25(4), 156-171. Also available online (http://www.clir.org/pubs/reports/conway2/)

Cornell University Library (2005). Tutorial on “Digital preservation management: Implementing short-term strategies for long-term problems”. (http://www.library.cornell.edu/iris/tutorial/dpm/index.html)

Dartmouth Digital Library Program: Policies(2011). "A Report from the Digital Projects and Infrastructure Group (DPIG)". Available athttp://www.dartmouth.edu/~library/digital/about/policies/preservation.html

 Graham, Peter S. (1998). “Long-Term Intellectual Preservation”. Collection Management, 22(3/4), 81-98.

Granger, Stewart (2000). Emulation as a Digital Preservation Strategy. D-Lib Magazine, 6(10).


Hedstrom, M. and Montgomery, S. (1998). Digital Preservation needs and requirements in RLG Member Institutions. Mountain View, CA: RLG. (http://www.rlg.org/preserv/digpres.html)

Jones, M., & Beagrie, N. (2001). Preservation management of digital materials: a handbook (p. 67). London: British Library.

Kirchhoff, Amy J. (2008). “Digital preservation: challenges and implementation”. Learned Publishing, 21, 285-294.

Ludasher, B., Marciano, R., & Moore, R. (2001). Preservation of digital data with self-validating, self-instantiating knowledge-based archives. SIGMOD Record, 30(3), 54-63.

Lynch, Clifford (1999). Canonicalization: A Fundamental Tool to Facilitate Preservation and Management of Digital Information. D-Lib Magazine, 5 (9).

Lynch, Clifford (1994). The integrity of digital information: Mechanics and definitional issues.Journal of the American Society for Information Science, 45, 737-44.

Moore R. et al (2000). Collection-based persistent digital archives – Part 2. D-Lib Magazine, 6(4).  (http://www.dlib.org/dlib/april00/moore/04moore-pt2.html)

Moore, R. et al (2000). Collection-based persistent digital archives – Part 1. D-Lib Magazine, 6(3).   (http://www.dlib.org/dlib/march00/moore/03moore-pt1.html)

Rothenberg, Jeff (1995). Ensuring the Longevity of Digital Documents. American, 272(1), 24–29.

Russell, Kelly (1999). “Digital Preservation: Ensuring Access to Digital Materials Information the Future.” CEDARS, www.leeds.ac.uk/cedars/Chapter.htm
Smith, Abby (2003). Digital Preservation: An Individual Responsibility for Communal Scholarship. EDUCAUSE Review. Available online athttps://net.educause.edu/ir/library/pdf/erm0338.pdf

Trusted Digital Repositories: Attributes and ResponsibilitiesRLG/OCLC Report (2002). Available onlinehttp://www.oclc.org/content/dam/research/activities/trustedrep/repositories.pdf?urlm=161690
UKOLN (2008) “An Introduction to Digital Preservation: Supporting The Cultural Heritage Sector”. Available at www.ukoln.ac.uk/cultural-heritage/documents/briefing.../briefing-31.doc

UNESCO’s Guidelines for the Preservation of Digital Heritage (2003). Available online athttp://unesdoc.unesco.org/images/0013/001300/130071e.pdf

United Kingdom (2003). Legal Deposit Libraries Act 2003.  Available online athttp://www.legislation.gov.uk/ukpga/2003/28/contents
van der Hoeven, J. R., Van Diessen, R. J., & van der MEER, K. (2005). Development of a Universal Virtual Computer (UVC) for long-term preservation of digital objects. Journal of Information Science, 31(3), 196-208.

Voutssas, Juan (2012). “Long-term digital information preservation: challenges in Latin America”. Aslib Proceedings, 61(1), 83-96.

Wikipedia. Digital preservation (http://en.wikipedia.org/wiki/Digital_preservation) (last visited on 1st  March, 2014)



Points to Ponder

  • Who is legally responsible for keeping every document collection or archive for the future?
  • What happens if the file format changes?
  • How much does Digitization Cost?
  • Is it possible to obtain another copy of the resource in the event of loss or damage?

Do You Know?

  • Why the content is considered as a major challenge for Digital Preservation?
  • What is essential in order to protect those aspects of a digital record and to measure the success of preservation interventions?
  • As the amount of digital data grows, what are the selection criteria to filter the data to be preserved? 

Interesting Facts

  • The longevity of digital contents dependents on the life expectancy of the access system, including hardware and software
  • LOCKSS (Lots of Copies Keeps Stuff Safe) is a consortia form of replication, while peer-to-peer data trading is an open, free-market form of replication
  • Clifford Lynch (1999) is recognized as the first person to introduce the idea of canonicalization. Canonicalization provides a language or framework for understanding the effects of file translation.
  • A Universal Virtual Computer (UVC) is a virtual machine (VM) specially designed for preservation of digital objects, based on emulation.

Point to Remember

  • Digital preservation is a set of processes and activities that ensure continued long-term access to information from all kinds of records, both scientific and cultural heritage that exists in digital form.
  • Preservation is an on-going process. Therefore, current, short, and long term costs and funding are important issues to deal with before and during a preservation project, in order to maintain their feasibility in the long term
  • Preservation programmes require significant upfront investment to create, along with ongoing costs for data ingest, data management, data storage, and staffing.
  • Digital preservation activities must be planned and implemented in ways that resource can be managed and sustained into the future. Future access to digital resources cannot be assured without institutional commitment to necessary resources that are required for digital preservation.
  • Magnetic storage media is highly sensitive to dust, heat, humidity and other climatic conditions. Most storage devices, without suitable storage conditions and proper management, may deteriorate very quickly without displaying any physical characteristics of external damage.
  • Digital preservation is relatively short life span of digital media and higher rate of obsolescence of the hardware and software used for accessing the digital records

Web links

www.dlib.org/dlib/march04/ayre/03ayre.html
http://www.clir.org/pubs/reports/conway2/
http://www.library.cornell.edu/iris/tutorial/dpm/index.html
http://www.dartmouth.edu/~library/digital/about/policies/preservation.html
http://www.rlg.org/preserv/digpres.html
http://www.dlib.org/dlib/april00/moore/04moore-pt2.html
http://www.dlib.org/dlib/march00/moore/03moore-pt1.html
www.leeds.ac.uk/cedars/Chapter.htm
https://net.educause.edu/ir/library/pdf/erm0338.pdf
http://www.oclc.org/content/dam/research/activities/trustedrep/repositories.pdf?urlm=161690
http://unesdoc.unesco.org/images/0013/001300/130071e.pdf
http://www.legislation.gov.uk/ukpga/2003/28/contents
http://en.wikipedia.org/wiki/Digital_preservation




No comments: