Thursday, December 4, 2014

06. Digital Library Architecture

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com



06. Digital Library Architecture


P- 01. Digital Libraries*

By :Jagdish Arora, Paper Coordinator


MCQ

0 / 1 Points

Question 1: Multiple Choice

............... is the right management software.
  • Wrong Answer Un-checked Apache
  •  Un-checked Active Directory
  • Wrong Answer Checked Dspace
  • Wrong Answer Un-checked GNU E-Prints
0 / 1 Points

Question 2: Multiple Choice

In Cloud Computing, data and application resides on ……………………
  • Wrong Answer Un-checked Desktop Machine
  •  Un-checked Public/Private Network
  • Wrong Answer Checked Interanet
  • Wrong Answer Un-checked Local Server
0 / 1 Points

Question 3: Multiple Choice

Which device is not required for implementing Image based digital library?
  • Wrong Answer Un-checked Scanner
  •  Un-checked Printer
  • Wrong Answer Un-checked Camera
  • Wrong Answer Checked Touch screen system
0 / 1 Points

Question 4: Multiple Choice

Which Digital Library Software is maintained by DuraSpace?
  • Wrong Answer Un-checked GreenStone
  •  Un-checked FEDORA
  • Wrong Answer Checked CONTENTdm
  • Wrong Answer Un-checked GNU E-Prints
0 / 1 Points

Question 5: Multiple Choice

Which Digital Library Software is using MySQL as backend?
  • Wrong Answer Un-checked CONTENTdm
  • Wrong Answer Checked Dspace
  •  Un-checked GNU E Prints
  • Wrong Answer Un-checked GreenStone
0 / 1 Points

Question 6: Multiple Choice

Which is not a Digital Library Software?
  • Wrong Answer Checked FEDORA
  • Wrong Answer Un-checked CONTENTdm
  • Wrong Answer Un-checked Dspace
  •  Un-checked SOUL
0 / 1 Points

Question 7: Multiple Choice

Which is not Open-Source Digital Library Software?
  • Wrong Answer Checked Dspace
  • Wrong Answer Un-checked FEDORA
  •  Un-checked CONTENTdm
  • Wrong Answer Un-checked GreenStone
1 / 1 Points

Question 8: Multiple Choice

Which is not required to create a DIGITAL LIBRARY?
  • Wrong Answer Un-checked Digital Library Software
  • Correct Answer Checked RFID Tag
  • Wrong Answer Un-checked Database Management System
  • Wrong Answer Un-checked Web Server
0 / 1 Points

Question 9: Multiple Choice

Which is the major metadata type used for authentication in Preservation Description Information?
  • Wrong Answer Checked Reference Information
  • Wrong Answer Un-checked Provenance Information
  • Wrong Answer Un-checked Context Information
  •  Un-checked Fixity Information
0 / 1 Points

Question 10: Multiple Choice

………………………… is NoSQL Database.
  • Wrong Answer Un-checked MySQL
  • Wrong Answer Checked Oracle
  •  Un-checked MongoDB
  • Wrong Answer Un-checked Postgres
1 / 10 PointsFinal Score:



...............................................................................................................................................


0. Objective

Page Contents 

 

Digital libraries are built around Internet and web technology, therefore, they need to follow the Internet standards and protocols so as to ensure interoperability, portability, modularity, scalability and seamless accessibility. A typical digital library implementation follows client-server architecture as does the Internet and web technology. The objectives of this section are to discuss and impart knowledge on the following aspects of technical infrastructure of a digital library: 

  • Client-server architecture as applied to the digital library;
  • Important features like scalability and sustainability, seamless access, interoperability, federation, capacity to handle multiple files & formats and location-independent identifiers, that should be considered while designing a digital library;
  • Digital library design models and architecture with examples of digital libraries built using these models;
  • Important Internet protocols and standards as applied to digital libraries;
  • Computers and Network Infrastructure: Including Server-side Hardware Components, Server-side Software Components, and Client-side Hardware & Software Components; and
  • Interoperability in Digital Library.

1.0 Introduction

The Internet and web technology are principle mechanism deployed in a digital library to search, navigate and deliver electronic resources across the globe. The digital contents in a digital library can be available on a single location or distributed across the network. End users or clients receive direct and seamless access to the information requested from a collection regardless of where the data is physically stored. A typical digital library implementation follow client-server architecture as does the Internet and web technology.  The online information search services like DIALOG, BRS Search and STN worked on “Host-terminal Technology”, wherein hosts were huge mainframe computers that controlled all aspects of search and communication sessions and dumb terminals were connected to the host computer. A centralized server managed communications, user query interaction, database management and data presentation. The data from different sources had to be converted into a single homogeneous structure and organization. In contrast, enabling technology behind digital libraries provides for seamless access to heterogeneous digital objects created on different platforms and hosted in diverse environment distributed at different locations on the Internet.   

2.0 Client-Server Architecture and Middleware

The development of client-server architecture is the major enabling technology behind distributed computing and databases. Client and server refers to both computer programs as well as to the computers. The client program typically resides on the user’s personal computer, while the server program resides on a server that hosts information contents. The server programe and client program communicate over a telecommunication network using a well-defined protocol.  The client program (web browser) is responsible for making a request to the server and for displaying the information it retrieves from the server.  The server is responsible for receiving request from the client, controlling access to the information, performing the computation needed to retrieve the information, sending requested information to the client after authentication, if required and recording usage statistics. Both client and server have their roles to perform and, therefore, workload is balanced between the two. Today’s PC-based clients can manage multiple tasks, such as maintaining simultaneous connections to variety of sources. This results in transparent access to information resources distributed across the Internet regardless of its location. The server handles the database management tasks and processing of requests from client.    

Middleware are computer programs that connects software components or applications on clients and servers. Middleware resides both on client and server. Middleware ensure that the client and server can communicate with each other irrespective of different hardware and software involved. This is made possible by use of standardized sequence of messages called protocols. The Application Programming Interface (API) is the middleware component that facilitates the transferring of messages between clients and server based on protocol. The API protocol defines a set of messages that both the client and server understand. The client’s API translate the message into a form that is platform independent and transmit it to the server over the network. The server’s API receives the message and translates into a form that the server understand. The server receive the message and responds to the client through its API. Middleware are used most often to support complex, distributed applications. It includes web servers, application servers, content management systems, and similar tools that support application development and delivery.

                                           Figure 1. Client Server Architecture

The most commonly used clients for Internet browsing, called browsers, are Microsoft’s Internet Explorer and Netscape Navigator. The clients (or browsers) understand different communication protocols so as to connect to different type of servers.  Browsers which facilitate access to resources hosted on the web servers are growing in popularity due to their relative ease-of-use, user-friendly interfaces, free availability and increasing capabilities. 



3.0 Key Principles: Digital Library Architecture

Major problems of digital library design are caused by differences in the computer systems, file structure, formats, information organization and different information retrieval requirements of collections (such e-journals, e-books, reference sources, online courseware, GIS, etc.) accessible through the digital library. While the web has emerged as the preferred media of information delivery and access, the use of standards and protocols makes it possible to make digital collections interoperable and accessible seamlessly.  Some of the important features that should be considered while designing a digital library architecture are as follows:

i)        Open Architecture: Open architecture refers to computer architecture or software architecture that allows adding, upgrading and swapping components or software modules. As opposed to open architecture, software and hardware with closed architecture have pre-defined modules or components that are not generally upgradable.
Open architecture allows potential users to see inside all or parts of the architecture without any proprietary constraints. Typically, an open architecture publishes all or parts of its architecture that the developer or integrator wants to share. Digital library design should use open architecture and a set of well-defined standards and protocols so as to facilitate scalability and interoperability.

ii)      Scalability, Extensibility and Sustainability: The scalability, extensibility and sustainability are three most important design features of a digital library that addresses the issue of ability of a digital library to handle increased volume of digital objects and its ability to sustain it for a long period of time. The digital library design should ensure that software should be able to handle large quantity of data, and hardware and network should be scalable to handle large quantity of digital objects and its transmission over the network. Moreover, digital library design and planning should provide for human and financial resources required for sustaining the digital library on long-term basis.  

iii)    Seamless Access: The digital libraries should provide transparent, seamless and platform-independent access to distributed array of information resources to users.

iv)    Interoperability: Interoperability addresses the issue of ability of digital libraries and its components to work together effectively in order to exchange information in a useful and meaningful manner. Use of open architecture and a set of well-defined standards and protocols ensure interoperability amongst heterogeneous digital libraries in distributed environment.

v)      Federation: Federation refers to distribution of responsibilities for content creation, management and administration of various functions and service of digital library. It needs to be ensured that the participants follow the agreed standards, technologies and tools.

vi)    Digital Preservation: The architecture of the digital library must ensure persistent and long-term access to its collection.

vii)  Modularity: It is a design approach that adheres to four fundamental tenets of cohesiveness, encapsulation, self-containment and high binding to design a system component as an independently operable unit.

viii)Platform Independence: The digital library architecture should be platform-independent both at hardware and software level.

ix)    Multiplicity of Files & Formats: Digital library should be able to handle multiple files and formats such as unstructured / structured text, audio, video, images, graphics, animation, etc.

x)      Location-independent Identifiers: Digital objects in the digital library should support location-independent identifiers such as handles, PURLs, DOI and OpenURL.

4. Key Components: Digital Library Architecture and Design

Different digital libraries have their own underlying design and architecture. Most digital library architecture provides for the five key components as shown in figure 2.


        Figure 2: Key Components of a Digital Library Architecture

4.1 User Interfaces

Digital libraries are required to provide interfaces for the user facilitating them to explore its collection, conduct searches, navigate through hierarchical menus of subjects, select and deselect searchable options, and sort search results in a fashion required by them.



4.2 Digital Repository

Digital repositories store and manage digital objects and metadata. The digital objects in a digital library may be “born digital” or digitized from the legacy document through the process of scanning. The metadata, that describes the digital objects to facilitate searching and discovery, may be extracted automatically or created manually. A large digital library may have several distributed repositories depending on collections it holds. The digital library developers interface with digital repository using Repository Access Protocol (RAP). RAP recognizes rights and permissions to enforce intellectual property rights, if required. E-commerce functionalities may also be present, if needed to handle accounting and billing.

4.3 Digital Objects Naming Service: Unique Identifiers

Digital objects in a repository require location-independent unique identifiers. These identifiers must remain valid whenever documents are moved from one location to another, or are migrated from one storage medium to another. A number of registry or resolver-based applications are being used currently for providing persistent URLs to digital objects. These unique identification schemes do not directly describe the location of the resource to be retrieved, but instead direct a user to an intermediate registry or resolver server that maps static persistent identifier to the current location of the object. However, “mapping table” in the registry or resolver server must be updated whenever the object is moved. Examples of most-used registry or resolver-based applications are: PURL, handles, DOI and OpenURL. 



4.4 Index Services

The process of indexing digital objects involves linking of database of digital objects to a text database consisting of keywords and subject descriptors. Digital objects are required to be linked to the associated keywords and subject descriptors so as to facilitate their retrieval. A digital repository typically stores a large amount of unstructured data in a two file system for storing and retrieving digital objects. The first file stores keywords or descriptors of digital objects along with a key to a second file. The second file contains the location of digital object.  The user selects a record from the first file using a search algorithm. Once the user selects a keyword or a descriptor from the first file, the location index in the second file finds the digital object and displays it. It is assumed that a digital repository has several indices and catalogues that can be searched to discover information for subsequent retrieval from a repository.

4.5 Search System and Content Delivery

The design of the digital library system should support searching of its collection. The search engine should support features like Boolean searching, proximity searching, phrase searching, etc. that are supported by traditional information retrieval system. Most digital library software integrate external search engines. Dspace, for example, uses Apache Lucene search engine. The digital library should also support content delivery via file transfer or streaming media. 


5. Digital Library Models and Architectures

While different digital libraries have their own underlying design and architecture, most of them support key components mentioned above.  Some of the important digital library architectures are discussed below:

5.1 Kahn-Wilensky Architecture

Kahn and Wilensky (1995) defined a general-purpose framework for a distributed digital library consisting of a very large number of digital objects comprising of all types of material accessible over the networks. Kahn and Wilensky defined the basic entities stored, accessed, disseminated and managed in distributed digital repositories. Introduction of naming conventions for identifying and locating digital objects in digital repository was one of the most important contributions of this framework. 


5.2 Dienst and NCSTRL

The Dienst (server in German) emerged as one of the first digital library architecture based on three basic principles of a distributed digital library system, i.e. open architecture, federation and distribution (Davis and Lagoze, 2000). Developed by the Digital Library Research Group at Cornell University, the Dienst model was implemented in the “Networked Computer Science Technical Research Library (NCSTRL; www.ncstrl.org)”. The NCSTRL has more than 150 participating institutions and 20,000 digital objects. The Dienst architecture specifies four core digital library services:

  • User Interface Services: serves as gateway to information obtained from other services.
  • Repository Services:  store and provide access to documents.
  • Index Services: provides interface to accept queries, match them with the inverted files (index) and return matched items.
  • Collection Services: define the components, services and documents available in the digital library and facilitate user interface services to interact with them.

The document model of the Dienst architecture facilitates three important functionalities that are considered important to any digital library design. These functionalities are:

  • Unique document names: a location-independent, unique identifier called docid
  • Multiple document formats such as ASCII, PS and TIFF
  • Document decomposition: physical and logical decompositions of a document.

Interoperability among Dienst servers provides the user with a single logical document collection, even though the actual collection is distributed across multiple servers.  This is accomplished by interaction between a set of Dienst servers at three functional levels: server registration (for locating the indexing and repository site for a specific publisher identified through the docid), distributed searching, and distributed document access.


                                               Figure 3: NCSTRL Search Interface

The NCSTRL collection is logically and administratively divided into publishing authorities, each having control over the addition and administration of documents in its own sub-collection repositories.  The metadata fields (like: title, author, abstracts, etc.) for each document in these repositories are indexed by one or more index servers.  The metadata is accessed through the Dienst protocol requests to the respective repository.  The Dienst protocol requests defined for the collection service and give access to the following information:

  • The list of publishing authorities that are part of the collection;
  • The network location: the address and port of the Dienst index servers that store indexing information;
  • Meta information about each index server; and
  • The correspondence of index servers to repository servers.

The NCSTRL has now moved from Dienst architecture and OAI-based architecture using Eprints for digital library software and is powered by the ARC for harvesting metadata .


5.3 CRADDL

Developed by the Digital Library Research Group at Cornell University, the Cornell Reference Architecture for Distributed Digital Libraries (CRADDL) is component-or-service-based digital library architecture. The CRADDL offers the following five basic services (Lagoze and Fielding, 1998):

  • Repository Service that provides mechanisms for depositing, storing and access to digital objects.
  • Naming Service to identify digital objects by Unique Resource Numbers (URNs) and registered then with the naming service.
  • Index Service that provides mechanism for discovery of digital objects via query.
  • Collection Service that provides mechanisms for the aggregation of access to sets of digital objects and services into meaningful collections.
  • User Interface Services Gateways, which provides human-centered interface to the functionality of the digital library.

CRADDL defines a basic set of digital library services, which interact as shown in Figure 4. The design of the User Interface Gateway can be customized for a specific community using mechanisms such as language, help facilities and graphical aids. The user interface interacts with collection services and facilitates access to one or more collections as desired by the user. This modular design of CRADDL allows easy integration of higher-level digital library services (summarization services, payment services, and the like) with existing CRADDL services, or evolution of existing services as the architecture matures (Logoze and Fielding, 1998).

                Figure 4. Interaction of Core Digital Library Services in CRADDL


5.4 FDBS Architecture and the NSDL

The NSDL and NDLTD are examples of digital libraries based on loosely-coupledFederated Database System (FDBS) that are cooperating but are autonomous database system. Individual participants in the FDBS continue their local operations as defined by their own Database Management System (DBMS). The FDBMS is the middleware software that controls and coordinates how the component databases cooperate.

The National Science Digital Library (NSDL) is a programme funded by the US National Science Foundation (NSF), Division of Undergraduate Education.  The objectives of NSDL are to build a digital library for education in science, mathematics, engineering and technology. The NSF is funding a number of projects under this initiative, each of these projects are making its own contribution to the Library.  Many of these projects are building collections while others are developing services for NSDL. The challenge of the NSDL is to ensure that each of the collections and services developed under the NSDL project is integrated as single coherent library, not simply a set of unrelated collections and activities.  In order to achieve interoperability, three sets of agreements are necessary amongst members:
  • Technical agreement between NSDL participants to decide on formats, protocols, security systems etc., to achieve interoperability between collections and services;
  • Content agreements to decide on the data and metadata, including semantic agreements on the interpretation of the information; and
  • Organizational agreements to decide on the ground rules for access, preservation of collection and services, payment, authentication, etc.


                        Figure 5: National Science Digital Library (NSDL)

The NSDL design attempts to achieve interoperability at the following three levels:

  • Federation: Federation requires division of responsibility amongst participating members. The participants agree to follow sets of standards, protocols and technologies. The process ensures interoperability, but participants are constrained to use agreed set of standards, technologies and tools.
  • Harvesting: The participants agree to make enable some basic shared services, without being required to adopt a complete set of agreements as in the federation.
  • Gathering:  Gathering uses web search engines approach. In this approach, even if participants do not co-operate in any formal manner, a base level of interoperability can be achieved by gathering openly accessible information using a web crawler.

Metadata from all the collections is stored in the repository and made available to provide NSDL services. 


5.5 The NDLTD: Federated Digital Library Design

The NDLTD is another example of digital libraries based on loosely-coupled Federated Database System (FDBS). The Networked Digital Library of Theses and Dissertation (NDLTD), the digital library of theses and dissertations of masters and doctoral students from various universities in the USA and around the globe has adopted a federated design approach.  To avoid the work and negotiation involved in adding protocol support to diverse search systems, the NDLTD team created an intermediate application that mediates search requests, and has access to descriptions of the search engines’ user interfaces, the types of queries supported, and the operators that define and qualify those queries (Fox and Powell, 1998).  The NDLTD has also defined the Searchable Database Markup Language (SearchDB-ML), an application of the extensible Markup Language (XML), for describing a search site.  Initially the model was tested on five sites using different software: two sites used Open Text, one used Dienst, another used HyperWave and the fifth used a Perl-based search script (search.pl).  All could easily be described with SearchDB-ML Lite, and the Federated Searcher application was able to support cross-language retrieval, for instance to submit queries in English to the German site and request translations (Fox and Powell, 1998).  The federated search system distributes a query to multiple sites and then gathers the results pages into a cache for browsing, results are displayed for user without merging (Fox et al., 2001).

                         Figure6: Networked Digital Library of Theses and Dissertations

The NDLTD team, with their research efforts, has evolved a mechanism for creation and access to a union catalogue of theses and dissertations of participating institutions from around the globe.  The NDLTD has developed a new metadata standard called ETDMS (Electronic Theses and Dissertation Metadata Standard), based on Dublin Core. The ETDMS is used by the participating institutions to expose and export their metadata using the Metadata Harvesting Protocol of the Open Archives Initiative. By making the theses metadata available in ETDMS format, the participating institutions make the theses accessible at a central portal of NDLTD maintained by VTLS (www.vtls.com), using their Virtua system, which provides a web interface to the ETD Union Catalog.  The Virtua NDLTD portal provides users with a simple interface to search and browse the merged collections of theses and dissertations.  After the users have identified relevant theses, they can follow the links provided to go directly to the items in their source archives.

5.6 Common Object Request Broker Architecture (CORBA)

Common Object Request Broker Architecture (CORBA) represents one of the widely known models of distributed object-oriented computing. The CORBA standards have been incorporated in the middleware of several commercially available network system products. The CORBA relies heavily on object-oriented and client-server technologies. It uses an open systems approach where digital library designers can implement the CORBA specifications in a variety of ways depending on their requirements. Applications of CORBA are platform-independent both at hardware and software level. Components of a digital library may be distributed among different servers. The processes involved in CORBA model of digital library are mentioned below:

i)   Interface Definition Language (IDL): defines the different types of objects contained in the digital library by specifying their interfaces.
ii)  Interface Repository (IR): stores all IDL definitions that represent the objects stored in the digital library.
iii) Object IDs: Each object definition stored in Interface Repository is given a unique object reference ID. With object IDs stored in the Interface Repository, object themselves can be stored on remote servers.
iv) CORBA supports requests for service when the client i) knows specific object to be retrieved as defined by IDL. This is known as static invocation interface; and ii) the client does not know the object’s ID and wants to discover it. This is known as a dynamic invocation interface.
v)  A Proxy Process handles all requests, either static or dynamic. The proxy process interfaces with the CORBA infrastructure through the Object Request Broker (ORB) and handles the request on behalf of the client. The ORB is a middleware that connects the clients and servers and allows for object communication. The OBR locates the target objects residing on different servers and routes requests to them through message passing.

The Infobus Project at the Stanford University has implemented the CORBA model as the distributed object network protocol to access a variety of information services (Ferrer, 1999). 

5.7 Software Agents Architecture and UMDL

The University of Michigan Digital Library Project (UMDL) uses a proprietary architecture to support the federation of loosely-coupled digital library collections and services. The core of architecture is the concept of the software agents that is based on object technology. An agent is highly encapsulated module of software representing an element of a collection or service with very specific capabilities. These software agents may dynamically team together to combine their capabilities to handle more sophisticated tasks such as the process of performing a complex search request. Software agents are classified into the following three groups (Ferrer, 1999):

i)   User Interface Agent (UIA): User Interface Agents mediate user access to the system. They convert queries and other user interactions into a form that can be understood by other agents. UIAs create and maintain user’s profiles that other agents can use to support searching. User profiles are consulted by the agents to facilitate delivery of SDI services.
ii)  Collection Interface Agent (CIA): Collection Interface Agent (CIA) mediate access to collections. Collection may include full-text documents, web sites and other multimedia objects. The major role of CIAs is to provide the registry with information regarding collections. They provide detailed description of content and structure of each collection. CIAs describe the indexing systems associated with each collection and how to search them, depending upon the syntax used. CIAs also describe how to access the collection and what protocols to be used for accessing collection.
iii) Mediation Agent (MA): Mediation Agents (MA) manage all the necessary tasks that support the system, such as those tasks that eventually direct a user to a collection based on specific query or user profile. Mediation Agents communicate only with other agents. Types of Mediation Agents include registry agents (to manage registry) and remora agents that provide SDI services. Mediation Agents are also assigned with the task of maintaining statistics of various activities. Tasks Planner Agent in an MA is responsible for managing tasks and other agents.

Software agents communicate with each other using a proprietary language developed by the UMDL called Conspectus Language. 



5.8 Open Archival Information System (OAIS)

The OAIS Reference Model was developed by the Consultative Committee for Space Data Systems (CCSDS) targeted to digital preservation projects. It is a framework for understanding and applying concepts needed for long-term digital information preservation. It is also a starting point for a model addressing non-digital information. The model establishes terminology and concepts relevant to digital archiving, identifies the key components and processes endemic to most digital archiving activity, and proposes an information model for digital objects and their associated metadata. The reference model does not specify an implementation, and is, therefore, neutral on digital object types or technological issues. The model can be applied at a broad level to archives digital image files, “born-digital” objects, or even physical objects (Sayer,  2001). OAIS has now been adopted as an ISO standard (ISO 14721:2003).

 The OAIS framework enjoys the status of a de facto standard in digital preservation. The OAIS reference model provides a high-level overview of the types of information needed to support digital preservation that can broadly be grouped under two major umbrella terms called i) Preservation Description Information (PDI); and ii) Representation and Descriptive Information. 

i)    Preservation Description Information
The preservation description information consists of four major types of metadata elements, namely reference information, provenance information, context information and fixity information as mentioned below:

a)  Reference Information: enumerates and describes identifiers assigned to the content information such that it can be referred to unambiguously, both internally and externally to the archive (e.g., ISBN, URN).
b)  Provenance Information: Documents the history of the content information (e.g., its origin, chain of custody, preservation actions and effects) and helps to support claims of authenticity and integrity.
c)  Context Information: documents the relationship of the content information to its environment (e.g., why it was created, relationships to other content information).
d)  Fixity Information: documents authentication mechanisms used to ensure that the content information has not been altered in an undocumented manner (e.g., checksum, digital signature).


                            Fig. 6: Open Archival Information System (OAIS) Model

ii)  Representation and Descriptive Information
Representation information facilitates proper rendering, understanding, and interpretation of a digital object's content. At the most fundamental level, representation information imparts meaning to an object’s bit-stream. For example, it may indicate that a sequence of bits represents text encoded as ASCII characters and furthermore, that the text is in French. The depth of the representation information required depends on the designated community for whom the content is intended. Descriptive Information metadata contains more ephemeral metadata, the information used to aid searching, ordering, and retrieval of the objects.


6. Interoperability in Digital Library

Interoperability is a critical problem in the network environment with increase in number of diverse computer systems, software applications, file formats, information resources and users. It is particularly more important in a digital library implementation given the fact that digital conversion activities are distributed amongst  libraries that hold traditional print-based resources and the digitized information is to be made accessible universally.  Collaboration amongst participants is, therefore, necessary in order to adopt  a framework for achieving suitable level of information sharing.

Interoperability is ability of digital library components and services to be functionally and logically interchangeable by virtue of their having been implemented in accordance with a set of well-defined publicly known interfaces. In this model different services and components can communicate with each other through open interfaces, and clients can interact with them in an equivalent manner. The ultimate goal of interoperability is to create and develop components of digital library independently yet be able to call on one another efficiently and conveniently (Paepcke, A., 1998)

Interoperability in digital library implementation addresses  the challenges of creating a general framework for information access and integration across many domain.  Digital library created using principles of interoperability result in repositories of digital contents which may have different attributes but can be treated in the same manner due to their shared interface definition. There are several approaches to achieve interoperability in digital library implementation. Paepcke (1998) identifies the following approaches to achieve interoperability:

i)    Standardization
Standardization is a proven approach to achieve interoperability. MARC and its different varients and Dublin Core are the known standards for bibliographic description of records. Z39.50 is a known standard for information retrieval. Standards and protocols applicable for a digital library are described in other module.
ii)   Families of Standards
Families of standard approach offers the choice of implementing one or more of several standards. The International Standardization Organization (ISO) standard for Open Systems Interconnection (OSI) created an interoperability framework based on the family of standards approach. OSI in its seven layers structure provides a family of standards concerned with a given set of interoperability issues in the area of interconnection. TCP / IP is an OSI protocol.
iii)   Specification-based Interaction
Interoperability can also be achieved by describing the semantics and structure of all data and operations. The specification-based interaction circumvent the requirement of mediation systems. Some of the well-developed enabling technologies to achieve this goal include Agent Communication Language (ACL).
iv)   Mediation
Interoperability can also be achieved by deploying mediation machinery and interfaces for translation of data formats and interaction modes between components. In the area of  interconnection of diverse networks, network gateways play the role of mediation. However, translations in the sense of simple mapping is not always sufficient to achieve complete interoperability. For example, two sets of digital libraries may sometime completely lack certain data types or operations and, therefore, cannot interoperate without further work.  However, mediation interfaces can be designed to augment functionalities and services that may search two digital libraries and present the results with its own value-additions. Such mediation facilities are called “wrappers”  or “proxies”. Mediation technology thrives on standardization. For example a single mediation system can cover all Z39.50 compliance sources at once.

v)    Mobile functionality
Mobile functionalities consists of software agent that travel over the network to sites where they access the service that they need. These software agents reach back to their original sites with the results of their works.  Java applets and servlets facilitates such mobile functionalities that deliver new capabilities to client components at run time. Insead of depending upon standardization or third-party mediation, mobile functionality accomplishes interoperability by exchanging codes that facilitates communication amongst components. 

7.0 Summary

Digital libraries are built around Internet and web technology, thefore, they need to follow the Internet standards and protocols so as to ensure interoperability, portability, modularity and scalability. A typical digital library implementation follow client-server architecture as does the Internet and web technology. Client-server architecture as applied to the digital library is discussed.

Major problems of digital library design are caused by differences in the computer systems, file structure, formats, information organization and different information retrieval requirements of collections accessible through the digital library. While the web has emerged as the preferred media of information delivery and access, the use of standards and protocols makes it possible to make digital collections interoperable and accessible seamlessly. Important features that should be considered while designing a digital library include: scalability and sustainability, seamless access, interoperability, federation, capacity to handle multiple files & formats and location-independent identifiers. These features are discussed briefly. It describes various digital library design models such as Kahn-Wilensky Architecture, Dienst, Cornell Reference Architecture for Distributed Digital Libraries (CRADDL), Federated Database System (FDBS) Architecture and National Science Digital Library (NSDL) and NDLTD, Common Object Request Broker Architecture (CORBA), Software Agents Architecture and UMDL, Metadata harvesting Architecture and OAIS Reference Model as well as examples of digital libraries built using these models.

A typical digital library in a distributed client-server environment consists of hardware and software components at server side as well as at client's side. It briefly describes interoperability in digital library. The chapter describes all components with examples of software products that are available in the market place.


Reference and Further Reading

Davis, J.R. and Lagoze, C. NCSTRL: Design and development of a globally distributed digital library. Journal of the American Society for Information Science, 51(3), 273-280, 2000.

Ferrer, Robert. University of Illinois: the federation of digital libraries: Amongst heterogeneous information systems. Science and Technology Libraries, 17(3&4), 81-119, 1999.

Fox, E.A. and Powell, J. Multilingual federated searching across heterogeneous collections.  D-Lib Magazine, September, 1998.
(http://www.dlib.org/dlib/september98/powell/09powell.html)

Fox, E.A. et al. Networked Digital Library of Theses and Dissertations: bridging the gaps for global access. Part. 1: Mission and progress. D-Lib Magazine, 7(9), 2001.
(http://www.dlib.org/dlib/september01/suleman/09suleman-pt1.html)

Fox, E.A. et al. Networked Digital Library of Theses and Dissertations: bridging the gaps for global access. Part. 2: Services and research, D-Lib Magazine, 7(9), 2001.
(http://www.dlib.org/dlib/september01/suleman/09suleman-pt2.html)

Kahn, Robert and Wilensky, Robert. A framework for distributed digital object services. cnri.dlib/tn95-01, May 13, 1995. (http://www.cnri.reston.va.us/k-w.html).

Kardorf, B. SGML and PDF: Why we need both. Journal of Electronic Publishing, 3(4), 1998. 14p. (http://www.press.umich.edu/jep/03-04/kardorf.html)

Lagoze, C.  and Fielding, D. Defining collections in distributed digital libraries. D-Lib Magazine, November, 1998 (http://www.dlib.org/dlib/november98/lagoze/07lagoze.html)

Paepcke, A.,  Chang, C-C.K., Garcia-Molina, H. and Winograd, T. Interoperability for digital libraries worldwide. Communications of the ACM, 41(4), 33-43, 1998.

Payette, S., Blanchi, C., Lagoze, C. and Overly, E.A. Interoperability for digital objects and repositories.  D-Lib Magazine, 5(3), May1999.(http://www.dlib.org/dlib/May99/payette/05payette.html)

Sayer, Donald, et al (2001). The Open Archival Information System (OAIS) Reference Model and its usage.
http://public.ccsds.org/publications/documents/SO2002/SPACEOPS02_P_T5_39.PDF(last visited on 4th Oct., 2006)

Sheth, A.P. and Larson, J.A. federated database systems for managing distributed, heterogeneous and autonomous databases. ACM Computing Surveys, 22, 183-236, 1990.

Glossary

Glossary

Digital library - A library in which a significant proportion of the resources are available in machine-readable format (as opposed to print or microform), accessible by means of computers. The digital content may be locally held or accessed remotely via computer networks. Digital libraries are organized and structured access to information contents in a distributed environment and assist users in searching, evaluating and utilizing resources in different digital formats. The digital library is assembled from a great variety of components. They include people, computers, networks, repositories, databases, search systems, Web servers, digital objects, and elements of objects, bibliographic records, and many more. Keeping track of these components requires a systematic approach to identification.
Digital repository - Many academic and research libraries are actively engaged in building digital collections of books, papers, theses, media, and other works of interest to the institution served, as a means of preserving and disseminating scholarly information. Usually locally authored or produced, content can be either born digital or reformatted. Access is generally unrestricted, in compliance with the Open Archives Initiative (OAI) protocol for metadata harvesting, which makes such archives interoperable and cross-searchable

Open Archival Information System (OAIS) - A reference model for digital archiving systems initially developed by the Consultative Committee for Space Data Systems and adopted by ISO as International Standard ISO 14721:2003. The OAIS model is widely used by libraries as a framework for the development of preservation archives for digital materials.

Digital object - In the technical sense, a type of data structure consisting of digital content, a unique identifier for the content (called a "handle"), and other data about the content, for example, rights metadata.

Electronic text - The words used by an author to express thoughts and feelings presented in digital, as opposed to printed or handwritten, form. To be displayed with formatting on a computer, text must first be encoded in a markup language. Electronic text can be "born digital" or converted from another format.

Federated database system - A federated database system is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network and may be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is a contrastable alternative to the (sometimes daunting) task of merging several disparate databases. A federated database, or virtual database, is a composite of all constituent databases in a federated database system. There is no actual data integration in the constituent disparate databases as a result of data federation.

Persistent URL (PURL) - A type of URL (Uniform Resource Locator) that does not point directly to the location of an Internet resource, but rather to an intermediate resolution service (PURL server) that associates the stable PURL with the actual URL, and returns the URL to the client, which then processes the request in the usual manner. PURLs were developed through OCLC participation in the Internet Engineering Task Force (IETF) Uniform Resource Identifier working groups as an interim solution to the problem posed by URL changes (lack of persistence) in the MARC description of Internet resources. They are an intermediate step on the path to URNs (Universal Resource Names) in Internet information architecture.

Digital Object Identifier (DOI) - A unique code preferred by publishers in the identification and exchange of the content of a digital object, such as a journal article, Web document, or other item of intellectual property. The DOI consists of two parts: a prefix assigned to each publisher by the administrative DOI agency and a suffix assigned by the publisher that may be any code the publisher chooses. DOIs and their corresponding URLs are registered in a central DOI directory that functions as a routing system. The DOI is persistent, meaning that the identification of a digital object does not change even if ownership of or rights in the entity are transferred. It is also actionable, meaning that clicking on it in a Web browser display will redirect the user to the content.

OpenURL - A framework and format for communicating bibliographic information between applications over the Internet. The information provider assigns an OpenURL to an Internet resource, instead of a traditional URL. When the user clicks on a link to the resource, the OpenURL is sent to a context-sensitive link resolution system that resolves the OpenURL to an electronic copy of the resource appropriate for the user (and potentially to a set of services associated with the resource). The OpenURL shows promise of becoming an important tool in the interoperation of distributed digital library systems and has the potential to change the nature of linking on the Web. 






No comments: