Saturday, December 13, 2014

28. Case Studies : D-space.

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com

28. Case Studies : Dspace.


P- 01. Digital Libraries*

By :Jagdish Arora, Paper Coordinator

Multiple Choice Question

1 / 1 Points

Question 1: Multiple Choice

Dspace is written in?
  • Wrong Answer Un-checked PERL
  • Correct Answer Checked JAVA programming language
  • Wrong Answer Un-checked C++
  • Wrong Answer Un-checked Ruby
0 / 1 Points

Question 2: Multiple Choice

What is Dspace?
  •  Un-checked An Open source software
  • Wrong Answer Un-checked A System Software
  • Wrong Answer Checked An Automation software
  • Wrong Answer Un-checked A Commercial software
0 / 1 Points

Question 3: Multiple Choice

Which metadata standard does DSpace support as native metadata format?
  • Wrong Answer Checked MARC
  •  Un-checked Dublin Core
  • Wrong Answer Un-checked AACR
  • Wrong Answer Un-checked IEEE LOM
1 / 1 Points

Question 4: Multiple Choice

Who built DSpace?
  • Wrong Answer Un-checked IBM
  • Wrong Answer Un-checked Microsoft
  • Correct Answer Checked MIT and Hewlett-Packard
  • Wrong Answer Un-checked HP
0 / 1 Points

Question 5: Multiple Choice

Who contributes to DSpace development?
  •  Un-checked Developer community across the world
  • Wrong Answer Checked Developer team from MIT and HP
  • Wrong Answer Un-checked Employees of Duraspace foundation.
  • Wrong Answer Un-checked JAVA developers from INFLIBNET Centre.
2 / 5 PointsFinal Score:

True / False

1 / 1 Points

Question 1: True or False

DSpace is a free software
Correct Answer Checked True
 Un-checked False
0 / 1 Points

Question 2: True or False

Dspace is a library automation software
Wrong Answer Checked True
 Un-checked False
0 / 1 Points

Question 3: True or False

DSpace uses Z39.50 protocol for metadata harvesting
Wrong Answer Checked True
 Un-checked False
1 / 3 PointsFinal Score:

  • Export Content

Case Study : DSpace

Case Study : DSpace
Introduction
1.0 Key factors to DSpace’s adaption
2.0 DSpace Information Model
2.1 DSpace System Architecture
3.0 Major Features of DSpace
3.1 Metadata Registry
3.2 File Format Registry
3.3 E-Persons
3.4 Authorization
3.5 Ingestion Process and Work Flow
3.6 Search and Browse
3.7 Handle System
3.8 OAI-PMH Support
3.9 Statistics
3.10 SWORD and Open URL Support.
4.0 Customization in DSpace
5.0 Some Live Examples
References

Introduction

Digital libraries are nothing but services related to management and organization of available digital information and its retrieval with proper user interfacing. It also include archiving and preservation of digital material, social issues attached to the same along with its application and evaluation to specific focused areas. To achieve all there are certain expectation from the software or solution which helps to create digital libraries.
For a proper digital library solution, primarily it is being expected that
  • it should be cost effective in terms of hardware and software platform to be procured and  management there after.
  • The digital library software is also expected to be technically simple, easy to install and manage, so that a layman with working knowledge of Information technology can install and administer the same.
  •  The solution should be robust and scalable in such a way that, it can handle large volume of data seamlessly along with inter-operable modular open architecture so that the necessary customization can be done easily without being dependent on software specialists.
  • Digital Library software should have user friendly, multi user interface so that multiple entities can use and administer the software simultaneously.
  • It is also desirable that it should be platform independent so it can run on any popular software and hardware platform.
  • Last but not least, it must have capability to handle all types of digital object, the object can be data set, document, multimedia or say any digital format.
There is one stop solution, which meets all the expectations described above, that is DSpace, It is a platform that;
  • captures items in any format and distributes it over the web,
  • indexes digital items so user can easily search and retrieve them,
  • preserve the digital content over the long term.
DSpace is typically being used to create digital library with three major roles ; First, it facilitates capture and ingestion of material with associated metadata; Second, DSpace provides easy access to the material with user friendly searching and listing mechanisms; Third, it facilitates long term preservation of digital material.
When initiated (in year 2000), DSpace was a joint project of Massachusetts Institute of Technology and Hewllet-Packard , DSpace project is now being handled by DuraSpace, a non-for-profit organization.

Key factors to DSpace’s adaption

DSpace has become quite popular among digital library implementers because it is open source and freely available software; it is being backed by very large worldwide user community who are ready to help.
DSpace software has been packaged in a way that it is very easy to use, It handles content in number of digital format, and the major advantage is; contents in DSpace can be made searchable through search engines like Google scholar, thus one can increase outreach of digital library without much effort.
DSpace can be used to store any type of digital material, it can store journal papers, Data Sets, Electronic Theses, Reports, Conference posters, Video’s , images. Logically speaking DSpace can be used to store any material which is available in digital format.
DSpace is basically an open source software available under Berkeley Software Distribution (BSD) licence , where in one can use and redistribute source as well as binaries or executable programme. DSpace software can be obtained from www.dspace.org or SourceForge’s dspace project site.
DSpace is having community based development model, where in there is SVN (subversion) based common source code control repository, which is having dedicated committers and contributors. This developer community welcomes every one to submit bug reports, patches, feature requests and other related things. There are number of active discussion groups and email lists are available for dspace support.

DSpace Information Model

Information Model of dspace is broadly divided in four components, that is Communities, Collections, Items and Bitstreams.
Community reflects the unit of an organization, collection in each community is distinct grouping of items, Items are logical content objects where as bitstreams are individual files.
The way data is organized in DSpace is supposed to reflect the structure of the organization and its digital collection.

Each DSpace site is divided into communities, which can be further divided into sub-communities reflecting the typical university structure of college, department, research centre, or laboratory. Communities contain collections, which are groupings of related content. A collection may appear in more than one community. Each collection is composed of items, which are the basic archival elements of the archive. Each item is owned by one collection. Additionally, an item may appear in additional collections; however every item has one and only one owning collection.
Items are further subdivided into named bundles of bitstreams. Bitstreams are, as the name suggests, streams of bits, usually ordinary computer files.
As discussed earlier, communities and collections are used to organize digital content or items in to a hierarchical form, It also contain limited set of descriptive metadata like name, description, licence and availability for that particular community and collection etc.digital library implementer can create communities based on logical grouping of digital items and then further subdivide in to the collections.
Items are logical units of content which consist Dublin core based metadata, as well as other metadata which has been encoded as bitstream. The item can be an electronic thesis, an e-book, photographs, a complete web page which can include images and style sheets associated with HTML page. Each item can contain one or multiple files along with metadata.
DSpace facilitates all three type of metadata that is Descriptive, Administrative and Structural. Descriptive metadata can by any thing which describes the item; it takes care of all elements of Dublin Core metadata set. Dspace also supports non Dublin Core metadata; these elements may not be searchable. Administrative metadata associated with an item can be  access restrictions, means who can access, remove or modify and item, there is no standard format available for these type of metadata.Structural metadata describes very basic attributes about item, for example what are the bitstream that contained in an item, or under which community and collection that item belongs to.
Bitstreams are individual digital files, which are having limited set of descriptive metadata like name, size of the file, format of the file etc. A pdf file, word doument, jpeg or bmp picture, executable program etc can be considered as bitstream.
Bundles are basically nothing but group of related files, for example, when you have a HTML page, it may also contain link to other HTML documents, images, flash object etc; therefore to view that html page, you also need to have all associated file, therefore html file along with other associated files makes a bundle. Dspace doesn’t support any metadata for bundles.

2.1 DSpace System Architecture
The architecture of DSpace has been divided in to three major parts, that is DSpace public API on the top, business logic layer in the middle and Storage API at the bottom.

(Image Courtesy: http://www.dspace.org)

DSpace public API takes care of user interfacing and services, it contains components for web user interfacing, Federation services, Metadata providing services as defined by Open Archive Initiative’s protocol, interfaces for web services like SWORD which is Simple Web-service Offering Repository Deposit etc.
These public API components interacts with business logic layer in the middle, which provides search and browse components, Handle manager, History manager which takes care of logs and statistics. Business logic layer also contains components to manage ingestion process and workflow, components to manage e-persons, groups and their authorization along with content management API and administrative toolkit.
The bottom layer, the storage api has been two major components, one is Relational Database Management System wrapper which connects to RDBMS like Postgres or Oracle through Java Database Connectivity and the Bitstream storage manager which directly interact with file system to store bitstreams.

Major Features of DSpace

3.1           Metadata Registry
Dspace provides facility to create new metadata registry or manage existing dublin core metadata registry, where in digital library implementer can manage and customize metadata elements.Metadata registry has three major components that is Schema, Element and Qualifier.

3.2           File Format Registry
In addition to metadata registry dspace also features registry to handle file formats, this file formats can be managed based on three levels that is supported, known and unknown. Here dspace administrator can specify MIME type, Name, Long Description and support level of the file along with file extension.

3.3           E-Persons
Persons or users who interact with dspace are called e-people, it is basically dspace user accounts, dspace provides facility to permit these e-people to login to the site, sign up to receive notification changes to subscription, submit new items to the collections, Administer collection / communities or entire dspace site.These e-people can also be managed by forming  groups.
3.4           Authorization
Authorization system in dspace enables administrators to give e-people the ability to perform add and remove operation by which an e-person can remove or add any community, collection or item.

As a collection administrator e-person can edit an item’s metadata, withdraw items or can map the items in to the collection Write permission enables e-person to add or remove bitstreams, where as read permission enables only reading of bitstreams.

3.5           Ingestion Process and Work Flow
Ingestion process is nothing but getting or putting contents in to Dspace. Dspace facilitates batch import as well as web based submisstion.
In batch import, multiple items can be submitted to dspace in one shot, this requires item to be in specific format along with metadata encoded in XML. Whereas in web based submission only one item can go each time, the item being submitted has to go through a work flow process defined for that collection. Assume that there are three steps
Step 1: May reject the submission
Step 2: Edit metadata or reject
Step 3: Edit Metadata


Image Courtesy http://www.dspace.org

A collection's workflow can have up to three steps as shown in figure above. Each collection may have an associated e-person group for performing each step; if no group is associated with a certain step, that step is skipped. If a collection has no e-person groups associated with any step, submissions to that collection are ingested straight into the main archive.

In other words, the sequence is this: The collection receives a submission. If the collection has a group assigned for workflow step 1, that step is invoked, and the group is notified. Otherwise, workflow step 1 is skipped. Likewise, workflow steps 2 and 3 are performed if and only if the collection has a group assigned to those steps.

When a step is invoked, the submission is put into the 'task pool' of the step's associated group. One member of that group takes the task from the pool, and it is then removed from the task pool, to avoid the situation where several people in the group may be performing the same task without realizing it.
3.6           Search and Browse
DSpace allows end-users to discover content in a number of ways, that is via external reference, such as a handle searching for one or more keywords in metadata or extracted full-text. It also provides option for browsing through title, author, date or subject indices, with optional image thumbnails etc.
3.7           Handle System
In present web world Universal Resource Location – URL of a digital content may change due to change in hardware or software, change in network or because of political change. This can be handled by creating a permanent URL independent of the repository. Handle system in DSpace provides a persistent handle for each item, if configured properly.

3.8           OAI-PMH Support
The Open Archives Initiative has developed a protocol for metadata harvesting. This allows sites to programmatically retrieve or 'harvest' the metadata from several sources, and offer services using that metadata, such as indexing or linking services. Such a service could allow users to access information from a large number of sites from one place. DSpace exposes the Dublin Core metadata for items that are publicly accessible. Additionally, the collection structure is also exposed via the OAI protocol's 'sets' mechanism.

3.9           Statistics
DSpace offers system statistics for administrator usage, as well as usage statistics on the level of items, communities and collections. Dspace also provide  customizable general overview of activities in the archive, by default including:
  • Number of items archived
  • Number of bitstream views
  • Number of item page views
  • Number of collection page views
  • Number of community page views
  • Number of user logins
  • Number of searches performed
  • Number of license rejections
  • Number of OAI Requests

3.10           SWORD and Open URL Support.
SWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of items into repositories. SWORD was further developed in SWORD version 2 to add the ability to retrieve, update, or delete deposits. DSpace supports the SWORD protocol via the 'sword' web application and SWord v2 via the swordv2 web application

DSpace supports the OpenURL protocol from SFX, in a rather simple fashion. If your institution has an SFX server, DSpace will display an OpenURL link on every item page, automatically using the Dublin Core metadata. Additionally, DSpace can respond to incoming OpenURLs too.

Customization in DSpace

Dspace provides varios flexibility and customization options, the areas that can be customized are;
  • Submission process, in which one can configure the submission steps to suit organization.
  • One can also customize Browse and search terms in which fields and files can be chosen to index and display in the browse interface.
  • Dspace also provides flexibility to choose Database; one can choose Postgres or Oracle.
  • DSpace can be customized to work with other web services- using Light Network Interface one can pull or push content to or from DSpace
  • One can create own user interface for interacting with DSpace.

Some Live Examples


References


Self Learning (Interactive / Video Tutorial)

Self Learning

Self Learning Part - I

Self Learning

Self Learning Part - II

Self Learning

Self Learning Part - III

Self Learning


Video / Interactive Tutorials


Interesting Facts

Greenstone provides a new way of organizing information and publishing it on the Internet in the form of a fully-searchable, metadata-driven digital library
Greenstone issued under the terms of the GNU General Public License
Its developers received the 2004 IFIP Namur award for "contributions to the awareness of social implications of information technology, and the need for an holistic approach in the use of information technology that takes account of social implications
Greenstone runs on all versions of Windows, and Unix, and Mac OS-X.
Any collection can be exported to DSpace and DSpace collection can be imported into Greenstone
Greenstone basically supports all popular file formats and media.


Timeline

Time Line
Description
Language Support (Multilingual )
Oct / 2013
UNESCO CD-ROM v2.8(Greenstone v2.86)
English French Spanish Russian
May / 2006
UNESCO CD-ROM v2.7 (Greenstone v2.70)        
English/French/Spanish/Russian
May / 2005
UNESCO CD-ROM v2.6 (Greenstone v2.60)
English/French/Spanish/Russian
Mar / 2004
UNESCO CD-ROM v2.0 (Greenstone v2.50)        
English/French/Spanish/Russian
Mar / 2003
UNESCO CD-ROM v1.1 (Greenstone v2.39)        
English/French/Spanish
Jun / 2002
UNESCO CD-ROM v1.0 (Greenstone v2.38)        
English

  • Export Content

Glossary

Starting Character
Term
Definition
D
Digital Library
Collection of digital objects (text, audio, video), along with methods for access and retrieval, and for selection, organization, and maintenance
Document
Basic unit from which digital library collections are constructed; it may include text, graphics, sound, video, etc.
Dublin core
The Dublin Core is a metadata element set. It includes all DCMI terms (that is, refinements, encoding schemes, and controlled vocabulary terms) intended to facilitate discovery of resources. The Dublin Core has been in development since 1995 through a series of focused invitational workshops that gather experts from the library world, the networking and digital library research communities, and a variety of content specialties. See the Dublin Core Web Site for additional information.
Dublin Core Metadata Initiative
The Dublin Core Metadata Initiative is the body responsible for the ongoing maintenance of Dublin Core. DCMI is currently hosted by the OCLC Online Computer Library Center, Inc., a not-for-profit international library consortium. The work of DCMI is done by contributors from many institutions in many countries. DCMI is organized into Communities and Task Groups to address particular problems and tasks (see the DCMI Work structure page). Participation in DCMI is open to all interested parties. Instructions for joining can be found at the DCMI web site on the DCMI Contact information page.
G
Greenstone
The name of this digital library software
GSDL
Abbreviation for Greenstone Digital Library
H
Harvester
A harvester is a client application that issues OAI-PMH requests. A harvester is operated by a service provider as a means of collecting metadata from repositories.
HTML
The standard text-formatting language for documents on the World Wide Web. HTML text files contain content that is rendered on a computer screen and markup, or tags, that can be used to tell the computer how to format that content. HTML tags can also be used to encode metadata and to tell the computer how to respond to certain user actions, such as a mouse click. For more information, see http://www.w3.org/MarkUp/.
I
Index
Information structure that is used for searching or browsing a collection
M
Metadata
In general, "data about data;" functionally, "structured data about data." Metadata includes data associated with either an information system or an information object for purposes of description, administration, legal requirements, technical functionality, use and usage, and preservation. . In the case of Dublin Core, information that expresses the intellectual content, intellectual property and/or instantiation characteristics of an information resource
N
Network
A group of physically discrete computers interconnected to allow resources to be shared and data exchanged, usually by means of telecommunication links and client/server architecture.
O
Open Archives Initiative
Develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly communication" For more information see http://www.openarchives.org/organization/index.html.
Open Archives Initiative Protocol for Metadata Harvesting
The Protocol "provides an application-independent interoperability framework based on metadata harvesting. There are two classes of participants in the OAI-PMH framework: Data Providers administer systems that support the OAI-PMH as a means of exposing metadata; and Service Providers use metadata harvested via the OAI-PMH as a basis for building value-added services. " For more information see http://www.openarchives.org/organization/index.html
Open Source
A computer program for which the source code is made available without charge by the owner or licenser, usually via the Internet, to encourage the rapid development of a more useful and bug-free product through open peer review. The practice also allows the product to be customized by its users to suit local needs (example: Linux operating system). To be certified "open source" under the Open Source Initiative (OSI), software must meet certain established criteria that include no restrictions on access.
P
Plugin
Code module for handling documents of different formats, used during the importing and building processes
Protocol
Set of conventions by which a receptionist communicates with a collection server
R
RTF
Rich Text Format, a standard format for interchange of text documents
S
Server
A host computer on a network, programmed to answer requests to download data or program files, received from client computers connected to the same network. Also refers to the software that makes serving clients possible over a network. Servers are classified by the functions they perform (application server, database server, faxserver, file server, intranet server, mail server, proxy server, terminal server, Web server, etc.).
Software
A generic term for computer programs and their associated documentation, as opposed to data used as input and generated as output. In computing, data is "processed"--software "runs." A software product consists of a set of instructions written by a programmer, distinct from the manufactured hardware used to run it. The term includes systems programs such as operating systems (OS), database management systems (DBMS), utilities that control the operation of the computer itself, and application programs designed to process data and accomplish specific tasks for the user.
Standard

Subject
The Dublin Core element used to describe the content of the resource. The element may use controlled vocabularies or keywords or phrases that describe the subject or content of the resource. See also "Using Dublin Core".
U
Unicode
A universal encoding scheme designed to allow interchange, processing and display of the world's principal languages, as well as many historic and archaic scripts. Unicode supports and fosters a multilingual computing world community by allowing computers using one language to "talk" to computers using a different language. A registered trademark of Unicode, Inc.
W
Web server
Standard program that computers use to make information accessible over the World Wide Web
Z
Z39.50
A NISO standard for an application layer protocol for information retrieval which is specifically designed to aid retrieval from distributed servers. http://lcweb.loc.gov/z3950/agency

Web links

http://www.greenstone.org/
http://wiki.greenstone.org/wiki/gsdoc/others/Greenstone_history.htm
http://ie.archive.ubuntu.com/disk1/disk1/sourceforge/g/project/gr/greenstone/OldFiles/gsdl-manual-oct2000.pdf
http://drtc.isibang.ac.in/xmlui/bitstream/handle/1849/153/S_gsdltutorial.pdf?sequence=2

http://wiki.greenstone.org/gsdoc/tutorial/gs2-current/en/install_greenstone.htm
http://www.disa.ukzn.ac.za/downloads/presentations/Greenstone%20Digital%20Library%20software.pdf
http://www.cs.waikato.ac.nz/~ihw/papers/05-IHW-DB-CreatingDL.pdf
gndec.ac.in/~librarian/kk/466-752-1-SP%5B1%5D.doc
http://www.publications.drdo.gov.in/ojs/index.php/djlit/article/viewFile/3655/2067

http://eprints.rclis.org/19924/



Points to Ponder



Greenstone has two separate interactive interfaces, the Reader interface and the Librarian interface. End users access the digital library through the Reader interface, which operates within a web browser.
Greenstone is highly interoperable using contemporary standards, It incorporates a server that can serve any collection over the Open Archives Protocol for Metadata Harvesting (OAI-PMH), and Greenstone can harvest documents over OAI-PMH and include them in a collection.
In GSDL the Librarian interface is a Java-based graphical user interface (also available as an applet) that makes it easy to gather material for a collection (downloading it from the web where necessary), enrich it by adding metadata, design the searching and browsing facilities that the collection will offer the user, and build and serve the collection.
In GSDL "Plug-ins" are used to ingest externally-prepared metadata in different forms, and plug-ins exist for XML, MARC, CDS/ISIS, ProCite, BibTex, Refer, OAI, DSpace, METS.
The reader's interface is available in the following languages: Arabic, Armenian, Bengali, Catalan, Croatian, Czech, Chinese (both simplified and traditional), Dutch, English, Farsi, Finnish, French, Galician, Georgian, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Kannada, Kazakh, Kyrgyz, Latvian, Maori, Mongolian, Portuguese (BR and PT versions), Russian, Serbian, Spanish, Thai, Turkish, Ukrainian, Vietnamese.