Saturday, December 6, 2014

15. Digital Rights Management (Part II : Access Management and Technologies)

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com

15. Digital Rights Management (Part II : Access Management and Technologies)


P- 01. Digital Libraries*

By :Jagdish Arora, Paper Coordinator

Multiple Choice Questions

1 / 1 Points

Question 1: Multiple Choice

Shibboleth is ________
  • Wrong Answer Un-checked Library Management Portal
  • Correct Answer Checked Open source solution for Access and Identity Management
  • Wrong Answer Un-checked Library Automation Software
  • Wrong Answer Un-checked Open source digital library
0 / 1 Points

Question 2: Multiple Choice

The process of converting information (plain text or numbers) from its normal, comprehensible form into an incomprehensible format is called;
  •  Un-checked Cryptography
  • Wrong Answer Un-checked Flickering
  • Wrong Answer Checked Digital Watermarking
  • Wrong Answer Un-checked Digital Signature
1 / 1 Points

Question 3: Multiple Choice

Web Cookies are stored in,
  • Wrong Answer Un-checked Web Server
  • Correct Answer Checked User's Browser
  • Wrong Answer Un-checked Cloud Storage
  • Wrong Answer Un-checked Proxy Server
1 / 1 Points

Question 4: Multiple Choice

Which of the following is not a type of digital certificate ?
  • Wrong Answer Un-checked Root or Authority Certificates
  • Wrong Answer Un-checked Institutional Authority Certificates
  • Wrong Answer Un-checked Web Server Certificates
  • Correct Answer Checked User Certificates
0 / 1 Points

Question 5: Multiple Choice

Which of the following is not an access control mechanism ?
  •  Un-checked Digital Rights Access Control
  • Wrong Answer Un-checked Mandatory Access Control
  • Wrong Answer Un-checked Discretionary Access Control
  • Wrong Answer Checked Role Based Access Control
0 / 1 Points

Question 6: Multiple Choice

Which of the following is not used for Access and Identity Management ?
  •  Un-checked DSpace
  • Wrong Answer Un-checked Open Athens
  • Wrong Answer Checked Kerberos
  • Wrong Answer Un-checked Shibboleth
0 / 1 Points

Question 7: Multiple Choice

Which of the following protocol/standard used by OpenAthens for security token exchange.
  • Wrong Answer Un-checked SMTP
  • Wrong Answer Checked HTTP
  • Wrong Answer Un-checked WSDL
  •  Un-checked SAML
1 / 1 Points

Question 8: Multiple Choice

Which of the following technique is part of User Authentication mechanism of DRM ?
  • Wrong Answer Un-checked Face Recognition
  • Wrong Answer Un-checked Fingerprint Scanning
  • Correct Answer Checked Password based Access
  • Wrong Answer Un-checked Ratina Scanning
1 / 1 Points

Question 9: Multiple Choice

Which technique allows embedding of a visible or invisible copyright notices or other verification messages in digital documents ?
  • Wrong Answer Un-checked Hologram
  • Wrong Answer Un-checked Document Protecting
  • Correct Answer Checked Digital Watermarking
  • Wrong Answer Un-checked Digital Stamping
0 / 1 Points

Question 10: Multiple Choice

_______________ is Commonly used hash function in Digital Signature.
  • Wrong Answer Un-checked Tiger
  • Wrong Answer Checked Checksum
  •  Un-checked MD5
  • Wrong Answer Un-checked FSB
5 / 10 PointsFinal Score:

0. Objectives

Objectives of this e-module is to impart knowledge on i) need for access management; ii) tools and techniques deployed for authentication and authorization of users; iii) technologies that are in vogue for controlling access to e-resources and tracking its misuse; and iv) technologies and protocols that are used for secure digital communication.  

1.0 Introduction

In previousmodule we have seen that Digital Rights Management incorporates an access control model aiming to deal with complex and evolving security associations between users and digital content. As it happens, it may be better described as access control mechanism, since it is deployed for facilitating access to copyright protected content to users. The DRM technology was created for the content creators as a means to stop illegal reproduction and distribution of their products. In online environment, the scope of DRM can be leveraged to control access to and usage of digital objects and to impose restrictions on their misuse. The digital rights management in a digital library consists of the following four components:

i)    license agreements and policies;
ii)   user authentication and authorization;
iii)  accuracy and integrity of digital content; and
iv)  accessibility including permissions to operate on digital objects or its metadata.

License agreements and policies for providing access to digital libraries are negotiated between the publishers and librarians or information managers. Users are authenticated and authorized to access content of a digital library as per the terms and conditions of license agreement. While users duly authenticated are allowed access to information according to their nature of clearances and authority, unauthorized users are blocked from accessing information. Confidentiality or trusted relationship is of paramount importance in digital libraries containing highly proprietary information. Accuracy or integrity means the continuing integrity of information stored in digital object servers. Digital library must not allow accidental or intentional corruption of information stored in it by unauthorized users or programs. Accessibility means that a secure computer system must keep information available to its users. The hardware and software of a computer system should keep working efficiently and the system should be able to recover quickly in case of disaster. Moreover, users are given access to digital contents with permissions to download (in case of users) and to add, edit, delete or amend in case of editors.

It is not only essential to ensure security of data on servers  but also during communication between clients and servers and vice versa to ensure authenticity and integrity of data. It is possible for a hacker to eavesdrop on communication between a user's browser and a Web server and hack sensitive information, such as a credit card number, login ID and passwords, etc. Techniques of data encryption are used for communicating sensitive information such as User’s password and PIN codes. Encryption renders data unintelligible and unusable even if accessed by an unauthorized person. Digital certificates are deployed to establish secure communication between clients and servers.  

2.0 Need for Access Management and Security

Access management and computer security are two very important issues in all commercial web applications including digital libraries. Given the fact that electronic content can be copied much more easily, content owners have a greater need for imposing measures to control misuse of their content in digital format. IP authentication and password access, two most commonly used authentication methods, are not able to protect the content from being duplicated or shared, thus creating a need for greater rights management controls. At the same time, digital media distribution can bring a variety of new business opportunities for owners or generators of content who incorporate the right technologies to control access to their digital content.

Access management is necessary for commercial digital collections because their access is restricted to its subscribers or licensed users. Even when access to digital collections is provided openly, access control is required for assigning responsibilities for operations such as additions, updation, edition and deletion of full-text content as well as metadata. Moreover, a well-managed digital library requires tracking of all changes made so that the collections can be restored if mistakes are made or computer files are corrupted. 

3. User Authentication

User authentication is the first level of security mechanism to protect a computer system hosting digital libraries from unauthorized users. Authentication basically means ascertaining credentials of a user that allow him or her to establish the right to use a network identity. Login and passwords and IP filtering are two methods that are commonly deployed for authenticating users, although there are a number of other mechanisms in vogue to authenticate a user before s/he is provided access to a digital library. User authentication mechanisms can be incorporated into firewall, a particular application, a document, or a network operating system such as Linux, Unix or Windows. Some of the important authentication mechanisms are given below:  


3.1 Log-in ID and Password-based Access

The most common means of access control requiring the matching of a username with its associated password. Log-in ID and password allow publishers or producer of information to control access to their electronic resources. Two types of passwords are:

i)  Fixed Passwords
User authentication is most often performed with fixed passwords. However, the fixed passwords remain vulnerable to guessing. Despite their inherent weaknesses, fixed passwords are used widely because of ease to use and implementation. However, because of their transparent mobility, passwords can be misused to access digital contents as it can be passed around by the users in subscribing institutions.

ii)   Dynamic Passwords
It is also possible to generate a chain of dependent or independent one-time passwords to the user. However, since it is very difficult to remember several passwords, users will be forced to keep this list somewhere either on paper, or in a file on their computers. Software solutions are now available to manage login IDs and passwords. 


3.2 Challenge-Response Authentication

The challenge / response scheme is used to prove identity of a user to the server by demonstrating knowledge of a secret that is known to the user and the server. Once a user sends proper response to challenge, s/he may again be prompted to answer another challenge concerned with the identity of the user. These challenge / response schemes are often implemented with hardware tokens. The HTTP/1.1 Digest Authentication standard (which is implemented in all popular browsers) is an example of a software-based challenge/response scheme. Challenge / response authentication mechanisms are designed to resist replay attacks, i.e., an adversary should not be able to re-use a particular response to authenticate in another session with another challenge.

Challenge-response protocols are also used to assert things other than knowledge of a secret value. CAPTCHAs, for example, are meant to determine whether a viewer of a Web application is a real person or a computer program. The challenge sent to the viewer is a distorted image of some text, and the viewer responds by typing in that text (used in Yahoo!, Google, etc.). The distortion is designed to make automated optical character recognition (OCR) difficult and preventing a computer program from passing as a human. Cryptographic-based challenge-response authentication involves using the password as the encryption key to transmit randomly-generated information as a challenge, whereupon the other end must return as its response in a similarly-encrypted value.

3.3 IP Filtering

The Internet Protocol (IP) address is numeric address assigned to every device (i.e. server, client, routers, firewall, bridges, printers, Internet fax machines, etc) connected to the Internet in order to identify and communicate with each other. An IP V4 address consists of 4 parts separated by dots, e.g. 202.141.130.75 where as IP V6 is having eight groups of four hexadecimal digits separated by colons, such as 2001:0db8:85a3:0000:0000:8a2e:0370:7334. Every machine that is on the Internet has a unique IP address,  if a machine does not have an IP address, it can not be connected to the Internet. In other words, the IP address acts as a locator for one IP device to find another and interact with it. IP addresses can also be assigned dynamically by a service provider.

IP filtering is very easy to implement for a subscribing institution who is required to provide IP addresses or its ranges to the publisher. The publisher, in turn, is required to maintain a database of IP ranges that are enabled to access its electronic resources and check all incoming requests for digital material against its IP database. Most publishers prefer providing IP-based access to their resources. IP-based access poses problems for those who are not on campus or are traveling. 

3.4 Web Cookies

HTTP cookies or web cookies or cookies, are parcels of text sent by a server to a web browser and then sent back unchanged by the browser each time it accesses that server. HTTP cookies are used for authenticating, tracking, and maintaining specific information about users, such as site preferences and the contents of their electronic shopping carts. Authentication information are stored in cookies on a user’s browser, so that the user is not required to provide repeated authentication information when navigating from resource to resource.

Most browsers allow users to decide whether to accept cookies, but rejection makes some websites unusable. Cookies are generally used for authentication in combination with other authenication mechanism such as Log-in ID / password and IP filtering.


3.5 Web Proxy

In context of user authetication, proxy server is a combination of software and hardware that acts as an intermediary between users and the Internet and enables authorized user of an institution to access licensed electronic resources, when connecting to the Internet from outside the premises of their institution.

EZproxy (http://www.oclc.org/us/en/ezproxy/) is one of the popular proxy server program that is easy to setup and easy to maintain for providing users with remote access to web-based licensed databases. It operates as an intermediate server between users and subscribed e-resources. Users connect to EZproxy, which, in turn, connects on their behalf to subscribed e-resources and obtain requested web pages and send them back to the users. Since EZproxy runs on a machine on the network of subscribing institution, e-resource publisher sees the requests as coming from an authorized IP address and permits access.

EZproxy works by dynamically altering the URLs within the web pages provided by the publishers of e-resources. The server names within the URLs of these web pages are changed to reflect the EZproxy server instead, causing users to return to the EZproxy server as they access links on these web pages. The result is a seamless access environment for the users without the need for automatic proxy configuration files.


3.6 OpenAthens (http://www.openathens.net/)

OpenAthens is an access and identity management service by Eduserv Technologies Ltd. to provide secure single sign-on facility to multiple subscription-based web resources combined with user management capability. Athens replaces the multiple usernames and passwords necessary to access subscription based content with a single username and password that can be entered once per session. It operates independently of a user’s location or IP address. Organisations adopting the OpenAthens service can choose between the OpenAthens Managed Directory service, where usernames are held by Eduserv Technologies, OpenAthens Local Authentication where usernames are held locally and security tokens are exchanged via a range of protocols: SAML, Shibboleth or Athens Devolved Authentication (AthensDA). Over 4.5 million users worldwide are using Athens to gain access to over 400 protected online resources via the Athens service. 

3.7 Shibboleth (http://www.shibboleth.net/)

Shibboleth is a open source middleware software which allows sites to make informed authorization decisions for individuals and provide access to subscription-based electronic resources. Shibboleth leverages campus identity and access management infrastructures to authenticate individuals and then sends information about them to the resource site, enabling the resource provider to make an informed authorization decision about authenticity and authorization of a user. Using Shibboleth-enabled access simplifies management of identity and access permissions for both Identity and Service Providers. It allows for cross-domain single sign-on and removes the need for content providers to maintain usernames and passwords.  Unlike “OpenAthens MD” where user names are held by Eduserv Technologies (developers of OpenAthens), user names in case of Shibboleth are held by individual institutions. Once a user visits “Shibboleth-enabled e-resource”, he / she is redirected to his / her Identity Provider Service (IDP) so as to complete the process of authentication and authorization.

Shibboleth is developed in an open and participatory environment, is freely available, and is released under the Apache Software License. Security Assertion Markup Language (SAML), an XML standard, are used in Shibboleth for exchanging authentication and authorization data between an identity provider (subscribing institution) and a service provider (publisher of an e-resource).


3.8 Referring URL

Referring URL (Teets and Murray, 2006) is a method for enabling authentication based on the URL. Users generally visit various digital libraries through their Library’s web sites. When a user clicks on a link given in a web page, the user’s browser sends a request to the clicked URL along the URL of “referring site”. This “sent along” URL is called a referring URL.  The referring URL can be used for authenticating a user. Ezproxy, a proxy server program that is used for allowing off-campus access to e-resources, can be configured  to check the referring URLs and automatically authenticate users. From the users' perspective, simply clicking links on a specified web site leads to access to subscribed e-resources. Referring URL provides seamless and transparent authentication to the user for accessing subscribed e-resources (Referring URL Authentication, 2014).


3.9 Kerberos (http://web.mit.edu/kerberos/www/)

Kerberos is an IETF-defined network authentication protocol, which allows individuals communicating over an insecure network to prove their identity to one another in a secure manner utilizing a trusted third party, called Key Distribution Center (KDC). It is also a suite of free software published by Massachusetts Institute of Technology (MIT) which implements this protocol. Its designers aimed primarily at a client-server model, and it provides mutual authentication, i.e. both the user and the server verify each other's identity. Kerberos protocol messages are protected against eavesdropping and replay attacks as it uses symmetric key cryptography that requires a trusted third party. Extensions to Kerberos can also provide for the use of public key cryptography during authentication.

Kerberos protocol are used in web server programs like Apache, routers and switches Internetwork Operating System (IOS), e-mail clients (Eudora and Mulberry),  Operating System (Mac OS, Microsoft Windows, Linux),  LDAP, FTP and Telnet kerberos-enabled clients.

4. User Authorization

The process of authentication ascertains the identity of a user, while authorization defines his or her permissions in terms of access to e-resources and extent of its usage. Authorization is granted to the successfully authenticate users according to his / her rights information available in the Access Management System (AMS). A user duly autheticated by one of the authentication mechanism described above may actually be entitled to access only a portion of digital collection subscribed by his / her institution. For example, an authenticated user may be authorised to access electronic journals from a publisher’s site but not electronic books, reference sources or other resources dependeing on what his institution has subscribed to. Typically all users in an institution are authorized to access all the subscribed e-resources. However, it is possible to define different levels of authorization for different categories of personnel in an institution. Besides, authorizing users of a digital collection, authorization also addresses the issue of responsibilities assigned to different personnel invloved in development of a digital library and their respective authorities in terms of addition, deletion, editing and uploading of records into a digital library. Personnel involved in development of a digital library may be assigned different levels of authority. Authorization is more challenging than authentication, especially for widely distributed digital libraries. Access control is one method for enforcing authorization. Typically, it assumes that the user or entity has already been authenticated. Access control policies that are in vogue are as follows:

4.1 Mandatory Access Control (MAC)

Mandetory Access Control (MAC) is generally more suited for high security authorizations. It is based on classification of objects and users according to security levels, where access is granted only if the security levels of objects and users match or the user’s level is higher.

4.2 Discretionary Access Control (DAC)

Discretionary Access Control (DAC) is based on users identities and authorizations. For example, if a user provides evidence (e.g., attribute certificate) that s/he has a capability to execute certain operations on an object, then the evidence is checked and access is granted.


4.3 Role Based Access Control (RBAC)

RBAC is based on assigning roles to users. A user may have multiple roles. Users gain access authority based on the role they are playing at the time of the request. This is similar to the implementation of access control mechanism available in all popular operating systems (windows,linux), where users gain authorizations depending on his / her roles, like administrator, power users, back-up operators, which have necessary authority to perform required activities. Different types of access controls or combination of these  are suitable for different applications and security requirements. 

4.4 Content Dependent Access Control (CDAC)

While most of the digital libraries provide access to its content based on qualifications and characteristics of users rather than their identity, digital libraries may also be designed to provide content-dependent authroization to its collection. For example, a user would be given access to “A rated video” only if s/he is older than 18 years.

5. Technologies for Access Control and Access Tracking

A number of copy-protection and access control technologies have been devised that would either restrict or completely stop unauthorised use of copyrighted digital material. These technologies are described briefly here:

5.1 Digital Watermarking

Digital watermarking is a technique which allows embedding of a visible or invisible copyright notices or other verification messages in digital documents, audio, video or image signals. Such a message is a group of bits describing information pertaining to the image or its author or a unique ID. The technique takes its name from watermarking of paper or money as a security measure. Digital watermarking can be a form of steganography, in which data is hidden in the message without the end user's knowledge. This system does not prevent copying, but ensures that any copies made of the media will be traceable to a particular copy and perhaps to a particular user.

Visible watermark is a secondary translucent image overlaid into the primary digital image. A simple example of a visible digital watermark is a visible logo or insignia placed over a digital material to identify the copyright. However, the watermark might contain additional information including the identity of the purchaser of a particular copy of the material.

Invisible watermarks do not change the signal to a perceptually great extent, i.e., there are only minor variations in the output signal. An example of an invisible watermark is when some bits are added to an image modifying only its least significant bits. Invisible watermarks that are unknown to the end user are steganographic. While the addition of the hidden message to the signal does not restrict that signal's use, it provides a mechanism to track the signal to the original owner.

5.2 Fractional or Partial Access

Digital libraries are often designed to allow users to access individual records or articles but not copy complete collections so as to prevent massive abuse of copyrighted material. Most of the online journals hosting sites discourage robotic or systematic dowloading of articles. Scitation, the online journal hosting platform for AIP, ASP, ASME, ASCE and several other societies, for example, monitor systematic and excessive download of journal articles from its site and cut-off users doing such activities. Ebrary, for example, allows only chapters of books for downloading. 


5.3 Control of the Interface

Producers of databases and other materials on CD ROM relied on proprietary software to access and display digital contents of CD ROM. The formats of contents as well as interface designed to access it was proprietary. It was, therefore, hard to arrange to access the material on CD ROM unless accompanied with the proprietary interface. Now that most of the digital contents are web-based and web browser is the defacto interface to access the web, this method has limited applications. SciFinder Scholar, for example, had an Z39.50 Window client that was used for interacting with the Chemical Abstracts database. The SciFinder Scholar (Window client) required extensive configuration before it could be used for searching Chemical Abstracts Online.

5.4 Flickering

Flickering is method used for allowing users to read information on the screen but not capture it by screen dumping. The method takes advantage of ability of human eye to capture rapidly changing images. Movies and television work because human eyes, when presented with images changing 24 or 30 times per second tries to average the result, rather than perceive the changing images. Computer screen dump, however, will capture the instantaneous appearance and, therefore, would capture the background bits in the process of flickering and resulting screen image would be useless.

5.5 Digital Object Identifier (DOI)

digital object identifier (DOI) is a character string (a "digital identifier") used to uniquely identify an object such as an electronic document. Organizations that meet the contractual obligations of the DOI system and are willing to pay to become a member of the system can assign DOIs.The DOI system is implemented through a federation of registration agencies coordinated by the International DOI Foundation,which developed and controls the system. The DOI system uses, but is not formally part of, the Handle Systemby the Corporation of National Research Initiatives (CNRI) for the Association of American Publishers. The Digital Object Identifier (DOI) is a mechanism for marking digital objects so as to identify them and enable copyright management and access in a digital environment. A typical use of a DOI is to give a scientific paper or article a unique identifying number that can be used by anyone to locate details of the paper, and possibly an electronic copy. The DOI does not change over time, even if the article is relocated, however, the DOI resolution system is required to be updated when the change of location is made. The main impetus of the DOI system is to provide publishers with a method by which the intellectual property right issues associated with their materials can be managed. 


6. Authentication of Digital Content

Authentication of digital content refers to continuing integrity and accuracy of information stored in digital object servers. Digital library must not allow accidental or intentional corruption of information stored in it by unauthorized users or programs. Techniques of digital signature and digital watermarking described below are used for authenitcation of digital objects.

6.1 Digital Signature

A digital signature is an electronic rather than a written signature that can be used to authenticate the identity of the sender of a message or of the signer of a document. It can also be used to authenticate that the original content of the message or document that has been conveyed is unchanged. Digital signatures are based on the concept of a hash function. A hash is a mathematical function that can be applied to the bytes of a computer file to generate a fixed-length number. One commonly used hash function is called MD5. The MD5 function can be applied to a computer file of any length. If two files differ by as little as one bit, their MD5 hashes will be completely different. To check whether a file has been tempered or not, MD5 hash value is calculated at the time of its creation and recalculated later to compare it with the original. If the two are the same then the files are almost certainly the same.

Digital signatures can be applied  to guarantee the authenticity of a  digital object. When the hash value is calculated, it is encrypted using the private key of the owner of the material. This together with the public key and the certificate authority creates a digital signature. Before checking the hash value the digital signature is decrypted using the public key. If the hash results match, then the material is unaltered and it is known that the digital signature was generated using the corresponding private key. For further details on application of digital signature see section 8.2 on “Digital Certificates”.


6.2 Digital Watermarking

Digital watermarking places a hidden data, such as a unique disc.  The technique allows embedding of a visible or invisible copyright notices or other verification messages in digital documents, audio, video or image signals. Such a message is a group of bits describing information pertaining to the image or its author or a unique ID. The technique takes its name from watermarking of paper or money as a security measure. Digital watermarking can be a form of steganography, in which data is hidden in the message without the end user's knowledge. This system does not prevent copying, but ensures that any copies made of the media will be traceable to a particular copy and perhaps to a particular user.



7. Technology for Secured Digital Communication

It is not only essential to ensure security of data on digital object servers but also during communication between server and client and vice versa to ensure authenticity and integrity of data. It is possible for a hacker to eavesdrop on communication between a user's browser and a Web server, to hack sensitive information, such as a credit card number, login ID and passwords or any other confidential data. Technologies of data encryption and digital certificates deployed to establish secure communication between clients and server are described below: 


7.1 Cryptography and Encryption

In cryptography, encryption is the process of converting information (plain text or numbers) from its normal, comprehensible form into an incomprehensible encrypted format, rendering it unreadable except for those who possess special knowledge, usually referred to as a key. Encryption is used in digital rights management to restrict the use of copyrighted material and in software copy protection to protect against reverse engineering and software piracy. Standards and cryptographic software and hardware to perform encryption are widely available. Software used for encryption can also be used to perform decryption, i.e. to make the encrypted information readable again. Encryption has its application in digital certificates described below.

7.2 Digital Certificates

Digital certificates are electronic files that are used to authenticate web resources, users and organizations over the Internet to ensure integrity of content. Digital certificates are part of a technology called Public Key Infrastructure (PKI) that includes organizations called Certification Authorities (CAs) (such as Entrust, VeriSign and Baltimore) that issue, manage, and revoke digital certificates, organizations called relying parties who use the certificates as indicators of authentication, and clients who request, manage, and use certificates. A Certification Authority (CA) might create a separate Registration Authority (RA) to handle the task of identifying individuals who apply for certificates. Organizations that use digital certificates to authenticate their users, maintain a database or directory, using a directory access protocol called LDAP that stores information about certificate holders and their certificates.

Digital certificates are based on public-key cryptography, which uses a pair of keys (private and public key) for encryption and decryption. It contains, amongst other things, the name, a serial number, expiration dates, a copy of the certificate holder’s public key and the digital signature of the certificate-issuing authority (CA), so that a recipient can verify that the certificate is genuine. These electronic credentials assure that the keys actually belong to the person or organization specified. Messages can be encrypted with either the public or the private key and then decrypted with the other key.

The recipient of an encrypted message uses the CA's public key to decode the digital certificate attached to the message, and then obtains the sender's public key and identification information held within the certificate. With this information, the recipient can send an encrypted reply. Digital certificates form the basis for secure communication and client and server authentication on the Web. Certificates can be used to do the followings:

  • Verify the identity of clients and servers on the Web.
  • Encrypt channels to provide secure communication between clients and servers.
  • Encrypt messages for secure Internet e-mail communication.
  • Verify the sender's identity for Internet e-mail messages.
  • Put your digital signature on executable code that users can download from the Web.
  • Verify the source and integrity of signed executable code that users can download from the Web.
The following illustration shows the basic process of using public and private keys to encrypt and decrypt a message sent over the Internet.

                                      Figure 1 : Use of Cryptography for Secure Information Communication

Most web browsers have several digital certificates preinstalled in them. Web browsers use digital certificates to secure access to web sites, without the knowledge of users. Digital certificates not only substantiate the authenticity of a message and its sender but also alert the recipient if the message was altered while in transit. A user can view and manage certificates within Internet Explorer / Windows by selecting “Internet Options” from the “Tools” menu and then choosing “Content”. Then, by selecting Certificates, you can manage your Trusted Root Certificates as well as your personal certificates.

Types of Digital Certificates

There are different types of digital certificates, each with different functions.  Digital certificates can be grouped into the following four major categories:

i) Root or Authority Certificates: These are certificates that create the base (or root) of a certification authority hierarchy. These certificates are not signed by another CA—they are self signed by the CA that created them.

ii) Institutional Authority Certificates: These certificates are also called campus certificates. These certificates are signed by a third party verifying the authenticity of a campus certification authority. Campuses then use their “authority” to issue client certificates for faculty, staff and students.

iii) Client Certificates: These are also known as end-entity certificates, identity certificates, or personal certificates. Client certificates are generally issued by campus CA.

iv) Web Server Certificates: These certificates are used to secure communications to and from web servers. The subject name in a server certificate is the DNS name of the server.

Organizations may also run their own certificate authority, particularly if they are responsible for setting up browsers to access their own sites (for example, sites on a company intranet), as they can trivially add their own signing certificate to those shipped with the browser.

8. Summary

Access management variably called, access control, terms and conditions, licensing conditions and Digital Rights Management (DRM) refers to control of access to digital libraries. Digital Rights Management (DRM) is a system of solutions created or designed as a means to prevent unauthorized access, duplication and illegal distribution of copyrighted digital media. The DRM technology was created for the publishers as a means to stop illegal reproduction and distribution of their products. In online environment, the scope of DRM can be leveraged to control access to and usage of digital objects and to impose restrictions on their misuse.

Four distinct aspects of access management are: i.e. i) license agreements and policies; ii) user authentication and authorization); iii) accuracy and integrity of digital content; and iv) accessibility including permissions to operate on digital objects or its metadata. License agreements and policies for providing access to digital libraries are negotiated between the publishers and librarians or information managers. Users are authenticated and authorized to access content of a digital library as per the terms and conditions of license agreement. While users duly authenticated are allowed access to information according to their nature of clearances and authority, unauthorized users are blocked from accessing information. Confidentiality is of paramount importance in digital libraries containing national defence information or highly proprietary information. Accuracy or integrity means the continuing integrity of information stored in digital object servers. Digital library must not allow accidental or intentional corruption of information stored in it by unauthorized users or programs. Accessibility means that a secure computer system must keep information available to its users. The hardware and software of a computer system should keep working efficiently and the system should be able to recover quickly in case of disaster. Moreover, users are given access to digital contents with permissions to download (in case of users) and to add, edit, delete or amend in case of editors.

It is not only essential to ensure security of data on servers and clients but also during communication between clients and servers and vice versa to ensure authenticity and integrity of data. It is possible for a hacker to eavesdrop on communication between a user's browser and a Web server and hack sensitive information, such as a credit card number, login ID and passwords or any other confidential data. A hacker could try to impersonate authorized users in order to get information which is normally not disclosed without authorization. Incidences of hackers getting access to important Web sites and defacing them are not uncommon. Techniques of data encryption are used for communicating sensitive information such as User’s password and PIN codes. Encryption renders data unintelligible and unusable even if accessed by an unauthorized person. Digital certificates are deployed to establish secure communication between clients and servers. 

The module describes different authentication mechanisms deployed by the publishers before allowing access the digital content hosted in digital libraries. A user duly autheticated to use a digital collection is given “authorization” to access content hosted in a digital library. Authorization determines a user's entitlements for accessing digital material available in a digital library. The chapter briefly describes levels of authorization. The process of access control which includes a number of copy-protection and access control technologies that would either restrict or completely stop unauthorised use of copyrighted digital material are briefly enunciated. Digital signature and digital watermarking, as technologies used for autheticating the integrity of digital documents are described briefly. Technology of cryptography, digital encryption and digital certificates used for secure transmission of digital information over a network is descibed briefly. Standards, protocols and rights markup languages used for secure communication and digital rights management are described briefly. Fire walls, proxy servers and intrusion detection system used for security of server and digital content are ellaborated. Lastly, the chapter describes viruses and anti-virus software available in the market place.


References and Further Reading

Adam, N.R., Atluri, V. and Bertino, E. A content-based authorization model for digital libraries. IEEE Transactions on Knowledge and Data Engineering, 14(2), 296-315, 2002.
Arms, William Y.  Digital libraries. Massachusetts, MIT Press, 2000.

Boettcher, Judith. Digital certificates: What are they, and what are they doing in my browser?  Campus Technology, 2002. (http://campustechnology.com/articles/39190/)

Claessens, Joris, Preneel, Bart and Vandewalle, Joos. A tangled World Wide Web of security issues. First Monday, 7(3), March 2002. (http://firstmonday.org/issues/issue7_3/claessens/index.html)

Haller, Neil, Metz, Craig, Nesser, Phil and Straw, Mike. A one-time password system: IETF Request for Comments, RFC 2289 (February, 1998). (http://www.ietf.org/rfc/rfc2289.txt)

Katzenbeisser, Stefan and Petitcolas, Fabien A.P.(ed.). Information hiding techniques for steganography and digital watermarking. Boston, Artech House, 2000.

Lynch, Clifford. A white paper on authentication and access management issues in cross-organizational use of networked information resources. Coalition for Networked Information, 1998.  (http://www.cni.org/projects/authentication/authentication-wp.html)

Mintzer, Fred,  Lotspiech, Jeffrey and Morimoto, Norishige. Safeguarding digital library contents and users: Digital watermarking. D-Lib Magazine, December 1997.

Russell, D and Gangemi, G.T. Computer security basics. Sebastopol, CA, O’Reilly, 1991.

Vemulapalli, S., Halappanavar,  M. and Mukkamala, R. Security in distributed digital libraries: Issues and challenges. In: Proceedings of the International Conference on Parallel Processing Workshops (ICPPW’02). Washington, DC, IEEE computer Society, 2002.

Web Sites (last visited on 28 February, 2014)

OpenAthens (http://www.openathens.net/)

EZproxy (http://www.oclc.org/us/en/ezproxy/)

Referring URL Authentication(http://www.oclc.org/support/services/ezproxy/documentation/usr/referer.en.html)
Wikipedia: Web Cookies (http://en.wikipedia.org/wiki/HTTP_cookie)
Wikipedia: Digital watermarking (http://en.wikipedia.org/wiki/Digital_watermarking)
Wikipedia:  Shibboleth Internet2 (http://en.wikipedia.org/wiki/Shibboleth_%28Internet2%29)
Wikipedia:  HTTPS (http://en.wikipedia.org/wiki/Https)
Wikipedia:  Challenge-response authentication) (http://en.wikipedia.org/wiki/Challenge-response_authentication)


Glossary

Access: Ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy and functionality deemed to be essential for the purposes the digital material was created and / or acquired for.
Archive: i) An organisation whose function is the preservation of resources, either for a specific community of users, or for the general good; ii) The collection of resources so preserved.
Authentication: Authentication is the process of identifying a user. Usernames and passwords are the most common method of authentication.
Authenticity: Authenticity refers to the trustworthiness of the electronic resource as it was originally created.
Authorisation: The process of granting or denying access to a network resource. Users are allowed to access various online resources based on the user's identity and authorization level. 
Bibliographic Databases:  A bibliographic database is a database of bibliographic records of books, chapters from books and articles in journal or magazine with links to its full-text. Bibliographic database allows the user to identify publications by author, subject, title, or other search terms. It generally provides full citation to the item, and abstracts and assigned subject headings. SciFinder Scholar, COMPENDEX, INSPEC, are examples of bibliographic databases.
Biometrics: Authentication techniques that rely on physical characteristics that can be automatically checked (fingerprints, speech, retina, etc.)
CAPTCHA: CAPTCHA is a type of challenge-response test used to determine whether the user is human or a computer program. CAPTCHA is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”.
Certificate Authority: An internal entity or trusted third party that issues, signs, revokes, and manages digital certificates.
Cipher Text: Data that has been encrypted. Cipher text is unreadable until it has been converted into plain text (decrypted) with a key.
Cookies A cookie, also known as an HTTP cookie, web cookie, or browser cookie, is a small piece of data sent from a website and stored in a user's web browser while the user is browsing that website.

Copyright:  Copyright is a set of exclusive rights granted by Government to its creator for a limited time to protect the particular form, way or manner in which an idea or information is expressed. Copyright may subsist in a wide range of creative or artistic forms or “works”, including literary works, movies, musical works, sound recordings, paintings, photographs, software, and industrial designs. Copyright is a type of intellectual property.
Compression: Image compression is the process of reducing size of an image by abbreviating the repetitive information such as one or more rows of white bits to a single code. The compression algorithms may be grouped into two categories, namely lossless compression and lossy compression. The lossless compression process converts repeated information as a mathematical algorithm that can be decompressed without loosing any details into the original image with absolute fidelity. No information is “lost” or “sacrificed” in the process of compression. Lossless compression is primarily used in bitonal images. The lossy compression process, on the other hand, discards or minimize details that are least significant or which may not make appreciable effect on the quality of image. This kind of compression is called “lossy” compression because when the image that is compressed using “Lossy” compression techniques is decompressed, it will not be an exact replica of the original image.  Lossy compression is used with grey-scale / colour scanning.
Decryption: The process of transforming cipher text into readable text.
Encryption: Encryption is the process of using a formula, called an encryption algorithm, to translate plain text into an incomprehensible cipher text for transmission.
Digital Materials: A broad term encompassing digital surrogates created as a result of converting analogue materials to digital form (digitisation), and “born digital” for which there has never been and is never intended to be an analogue equivalent, and digital records.
Digital Object: A digital object is a description of an object that can be represented by a computer. This may include databases, spreadsheets, word processor documents, web pages, video, audio, images, maps, 2 and 3-d models, etc.
Digital Object Identifiers (DOI): The Digital Object Identifier is a means of persistently identifying a piece of intellectual property (a creation) on a digital network, irrespective of its current location.
Digital Rights Management (DRM): Digital Rights Management (DRM) refers to technological solutions used to control or restrict the use of digital media contents on electronic devices. A Digital Rights Management system protects intellectual property by using a number of methods such as encrypting the data, marking the content with a digital watermark or similar method so that the content cannot be freely distributed.
Digitization: Digitization refers to the process of translating a piece of information such as a book, journal articles, sound recordings, pictures, audio tapes or videos recordings, etc. into digital format.
Fair Use: A concept in copyright law that allows limited use of copyright material without requiring permission from the rights holders, eg. for education, scholarship or review purposes.
Firewall: Software or hardware that creates a barrier between a trusted and an un-trusted network (e.g. the Internet), allowing or forbidding data to cross the barrier based on a set of rules that an administrator has configured.
Full-text Databases: Full-text databases contain the electronic version of entire contents of a document (journal articles, report, paper, etc.) that is available for printing or downloading.
Hacker: An individual, with or without skill, who break into security systems.
Hash Function: An algorithm which calculates a value based on a data object, mapping the data object to a smaller data object, which is the hash result.
Http:  Stands for Hyper Text Transport Protocol, the protocol for moving hypertext files across the Internet. It requires an HTTP client program on one end, and an HTTP server program on the other end. HTTP is the most important protocol used in the World Wide Web (WWW).
Https: Secure Hyper-Text Transfer Protocol using SSL.
Integrity: refers to integrity of documents to ensure that it is complete and unaltered from the time of creation.
Intellectual Property Rights (IPR): The right to possess and use intellectual property, conferred by means of patents, trademarks, and copyrights. Although IPR laws are enacted and enforced on a strictly national basis, once a patent or copyright has been granted in one country and disclosure of an invention or creative work has been made, information technology makes it available throughout the world.
Internet Protocol Address: The IP address is a numerical address consisting of four numbers separated by periods, e.g. 128.128.25.3. Every host on the Internet is assigned a unique identifier called an IP address or Internet Protocol address.

LDAP: An acronym for Lightweight Directory Access Protocol, which defines a standard for organizing directory hierarchies and interfacing to directory servers.
License: A license is an agreement between the publisher and the user wherein the publisher transfers the non-exclusive and non-transferable rights to use materials to the user or licensee. The publishers use license agreements as legal method for controlling the use of their e-resources.
Open Access: An alternative method of scholarly publishing wherein the cost of publishing and dissemination of scholarly content is charged from the authors, their affiliated institutions or funding agencies instead of libraries or its users.
Password: A series of characters that enables a user to access specific files, computers, or programs. The password helps ensure that unauthorized users do not access the computer. Within Athens, a password together with a username (and the host address depending on the type of account) ensures that unauthorized users do not access the online services.
Private Key: One of two keys used in public key cryptography. The private key is known only to the owner and is used to sign and decrypt messages.
Public Key: One of two keys used in public key cryptography. The public key can be known to anyone and is used to verify signatures and encrypt messages.
Resource: An online service (bibliographic or full-text database) accessible to user. 
Single Sign On (SSO): An authentication service wherein a user only needs to login once to access a number of different resources.
Secure Sockets Layer (SSL): A protocol developed by Netscape that enables secure transactions via the Internet. URLs that require an SSL connection start with https: instead of http. Transport Layer Security (TLS) protocol now incorporates most of the provision of Secure Sockets Layer (SSL).

Security Token: A security token (also called an authentication token, hardware token or cryptographic token) is a small hardware device that a user carries to get authorization to access a network service. The device may be in the form of a smart card or may be embedded in a commonly used object such as a key ring. Security tokens provide an extra level of assurance through a method known as two-factor authentication: the user has a personal identification number (PIN), which authorizes them as the owner of that particular device; the device then displays a number which uniquely identifies the user to the service, allowing them to log in. Unlike a password, a security token is a physical object.

Shibboleth: An Internet2 project that has created an architecture and open-source implementation for federated identity-based authentication and authorization infrastructure using SAML.

Steganography: Art and science of writing hidden messages in such a way that no one apart from the intended recipient knows of the existence of the message. It is in contrast to cryptography, where the existence of the message itself is not disguised, but the content is obscured.

Symmetric Cryptography: A branch of cryptography involving algorithms that use the same key for encryption and decryption or signature creation and signature verification.

Web Server: A Web server is either software that manages Web sites or the hardware on which server software is run. A server may be linked to the World Wide Web or it may be an “internal only” server, meaning only certain individuals may have access to it. A web server simply gets a browser request and sends the appropriate web page or data.

XML (Extensible Markup Language): XML is an extremely simple dialect of SGML. The goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML It has been designed for ease of implementation and for interoperability with both SGML and HTML.



No comments: