Thursday, December 18, 2014

30.Case Study: Eprints

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com

30.Case Study: Eprints


P- 01. Digital Libraries*

By :Jagdish Arora, Paper Coordinator

Multiple Choice Questions

0 / 1 Points

Question 1: Multiple Choice

Encoding used for Multilingual support at Eprints is
  •  Un-checked UNICODE
  • Wrong Answer Checked ASCII
  • Wrong Answer Un-checked ISCII
  • Wrong Answer Un-checked CJK
0 / 1 Points

Question 2: Multiple Choice

Eprints is written in
  • Wrong Answer Un-checked php
  • Wrong Answer Un-checked Python
  •  Un-checked PERL
  • Wrong Answer Checked C++
0 / 1 Points

Question 3: Multiple Choice

Eprints software is a product of
  • Wrong Answer Un-checked Univ. of Waikato
  •  Un-checked Univ. of Southampton
  • Wrong Answer Checked MIT
  • Wrong Answer Un-checked SPARC
0 / 1 Points

Question 4: Multiple Choice

Eprints supports how many types of Fulltext access (Access control
  •  Un-checked 3
  • Wrong Answer Un-checked 1
  • Wrong Answer Un-checked 2
  • Wrong Answer Checked 4
0 / 1 Points

Question 5: Multiple Choice

Export functionality in Eprints uses which of the following technology
  • Wrong Answer Un-checked HTML
  • Wrong Answer Un-checked SGML
  •  Un-checked XML
  • Wrong Answer Checked JAVA
0 / 1 Points

Question 6: Multiple Choice

Recommended web server software for Eprints
  • Wrong Answer Un-checked IIS
  •  Un-checked Apache
  • Wrong Answer Checked IBM HTTP Server
  • Wrong Answer Un-checked Jetty
0 / 6 PointsFinal Score:

True or False

1 / 1 Points

Question 1: True or False

Authority files at Eprints helps to improve metadata quality & uniformity
Correct Answer Checked True
 Un-checked False
0 / 1 Points

Question 2: True or False

Customization at archive level is not possible
Wrong Answer Checked True
 Un-checked False
1 / 1 Points

Question 3: True or False

Eprints enable multiple archives on single installation
Correct Answer Checked True
 Un-checked False
1 / 1 Points

Question 4: True or False

Eprints is OAI-PMH complaint
Correct Answer Checked True
 Un-checked False
1 / 1 Points

Question 5: True or False

Eprints supports diverse document types along with user defined ones
Correct Answer Checked True
 Un-checked False
4 / 5 PointsFinal Score:

1.0 Introduction

The paradigm shift from traditional to digital library services has changed the concepts of information services in a short span of time. Digital Libraries refers to organized and managed collections of digital material, with associated services, accessible over a network. Significant amount of contribution in terms of standards, technologies, techniques, and best practices related to the development and management of digital libraries have taken place in recent past. Digital libraries encompass several facets such as content creation and capture, information storage (digital objects + metadata), search, display and access (user interface), access management, Interoperability, preservation and maintenance etc. Key digital services like electronic journals, portals and gateways, RSS, digital archiving etc have made LIS professional’s role more efficient and effective. Digital archiving is important component of digital libraries and it has evolved along with it; today various technologies and tools are available in the arena of digital archiving. 

Digital Archives

It provides a foundation for preservation of digital collections by storing and providing seamless access in a secure environment.  Digital preservation is the set of management processes that ensure the long-term accessibility of digital content. The process of digital archiving involves digitization and capturing born digital documents, storage, preservation, metadata assignment, collection policies, access interface (retrieval), and dissemination of information. Strategic overviews on broader issues like digital materials, standardizations, archiving tools and technologies, technological obsolescence etc are important in setting up long term digital archives. 



Need of Digital Archive Software

The explosion of information in digital media and computer processing power has resulted in many systems where the Producer role (researcher) and the Archive role(librarian) are the responsibility of the same entity (Organization). Having a robust digital archiving system in place with long term preservation polices are the need of an hour. Many organizations across the world have come forward to put their scholarly literature in open domain by establishing digital archives which are popularly known as ‘institutional repositories’. These archives help organizations to preserve and showcase their intellectual output to the external community by providing copyright compliance open access to its scholarly literature.  Many libraries and research community are benefited by this noble initiative by getting timely access to scholarly literature at free of cost.
Today we can see many universities and research organizations across the world are showing keen interest in establishing institutional repositories to benefit themselves as well as society at large. Digital archiving field demands robust tools and technologies which enable long term preservation and delivery in secure environment. Various proprietary and open source tools and technologies are available with large user community and support in the area of digital archiving. Popular open source digital repository software’s are Eprints, Dspace, Fedora, etc. 

Eprints software

Eprints is open-source software available under GNU General Public License developed at the E&CS Dept. of the Univ. of Southampton. Eprints Written in PERL, recommended for UNIX-like operating system.  Eprints requires dependence software’s to configure, namely MySQL as RDBMS, Apache as server software, XML for import, export options, and IRStats for various analytics of the repository. Operating system specific Eprints can be downloaded from http://www.eprints.org/software/. Strong user community
Primary objective of this software is to enable institutions/organizations to set up and maintain eprint archives or Institutional repositories for their scholarly digital content and make them available in open domain. Eprints supports ‘Green Road’ channel of open access publishing by facilitating institutions/organizations to establish Institutional repositories. It provides a web interface for managing, submitting, discovering, and downloading documents. EPrints addresses high metadata quality to enhance easier data entry and interoperability. 

Hardware

  • Intel P4 processor
  • 512 MB RAM & above
  • 40 GB Hard Disk Space (Depends on collection size)
  • Network Interface
  • It runs on lower H/W configurations also.

Software

  • OS -Linux compatible (Fedora Core Release, RHEL, CentOS, Ubuntu), windows XP and later (Win32)
  • Web server Apache-2.0 or later with the mod_perlversion 2.0 module (significantly increases the performance of Perlscripts)
  • RDBMS (MySQL5.0)
  • ImageMagick & tetex-latex (helps in rendering equations)
  • PERL Modules (perl-MIME-Lite, perl-XML-LibXML, perl-XML-Parser, Term::ReadKey)
  • Xpdf & antiword (fulltext indexing)
  • Browser (Mozilla or any other graphical browser) 
  • Mail server (sendmail or any other)

Network

  • Public IP Number for the server & Registered host name (fully qualified domain name) –provided by ISP.
Example: 172.16.0.19 (Public IP) & nal-ir.nal.res.in (domain name)


Installation manual

Detailed step by step installation manual for various operating systems is available at eprints documentation website http://wiki.eprints.org/w/EPrints_Manual



Eprints –Key features

  • Eprints institutional repository software is available free on Internet with source code since 2000. Various versions were released from time to time with latest features and upgradation packages.
  • It is developed by dedicated team at Department of Electronics, University of Southampton. All versions of eprints with contributed modules can be downloaded athttp://www.eprints.org/ or http://files.eprints.org/
  • Developed and distributed under GNU license in order to restrict exploitation of the software. One can customize the source code as per their requirement and use, but it is prohibited to get commercial benefits from the same.
  • It can be installed on various variants of Linux like operating system, Ex. Fedora, RHL, Ubuntu, Debian. It can also be installed on windows XP and later (Win32). It is recommended for Linux variants for best results.
  • Eprints is built on various technologies like PERL as scripting language, MySQL as backend database, Apache as server, XML for Import, export and display functionalities, other packages like TeX system and ImageMagick for rendering Latex equations, antiword, Xpdf etc.
  • It supports multiple archives on single installation, which means user can run number of archives on single installation
  • It supports various means of document retrieval mechanisms such as simple and advanced search, browse option for various metadata fields like author, title, document type, year etc.
  • Web based administration functionality of eprints enables users to administer anywhere from the world with authentication (UserID and Password)
  • Eprints is Unicode compliance software which accommodates major languages of the world. Archive managers can build collection in many languages with search functionality. User interface can also be developed in native languages.
  • It is developed on OAI-PMH frame work which enables cross repositories search by harvesting metadata by centralized harvesters. Users can search multiple repositories at single point, for fulltext documents they will redirected to parent repository.
  • It can be customized in various ways, such as home page, browse views, document types, metadata fields, subject categories,
  • Eprints functionalities can be extended by writing plugins in PERL, users can find various third party extended functionality modules on eprints website
  • It supports one of the web 2.0 functionality by generating RSS feeds for recent items, browse views, search results etc, to stay updated on the latest additions to the repository
  • It supports multi tier access control to the archived documents, namely “Anyone”where anybody on the Internet can access the fulltext document, “Registered users only” where only authorized users (members of an organization or institute) of the repository can access, “Archive staff only” where only depositor and archive administrator will have access to the fulltext document.
  • Multi - Role based user types are available to delegate roles and responsibilities in archive management. Registered users can only deposit documents at archive/repository, Editors/Moderators/Reviewers can deposit and review the deposited documents and edit, reject, send it back to depositor etc , Administratorswill have overall administrative privileges of the repository.
  • Authority files helps  to avoid duplicate entries, improve metadata quality & uniformity
  • It supports various metadata formats like METS, Dublin Core or other Digital library interoperability formats. Users can incorporate their custom made metadata fields as per collection requirement
  • It supports bulk import & export feature to facilitate easy and faster archiving process. The popular file formats which Eprints supports for import & export feature are BibTeX, PubMed XMLEndNote, Reference Manager etc.
  • Good documentation is available at eprints website and dedicated team answers all quires raised across eprints community though email discussion forum. These valuable discussions threads are archived to serve as ready reference for similar problems.


Eprints Workflow

The workflow at Eprints can be divided in to two blocks for better understanding, First block for archive managers (brown colored) and second block for archive users (green colored). First block is divided into two stages, first stage where authorized (Repository staff, Authors, Creators etc) user will deposit documents by assigning appropriate metadata, access rights, subject category etc. In second stage moderator/editor/reviewer will check for authenticity, validate metadata, access rights of deposited document and incorporate changes if necessary, and then it will be moderated to live archive.  If deposited document fails to satisfy archive polices moderator can either destroy the document or send it back to depositor by sighting appropriate reason.
Second block of the work flow explains process of document retrieval by end user. Archive users can retrieve documents by browse or search mechanism, and full text will be available based on the access type granted by repository managers. Users can subscribe to RSS feed of the archive to stay update about latest additions.  Following flow chart depicts workflow of eprints software.   

Alternate Text


Eprints Web Configuration

Entire eprints web configuration can be split into following components, web server, SQL database, PERL scripts for repository activities and XML configuration files. User will submit their request to archive via web browser which will talk to web server. PERL scripts are invoked by web server to perform required task. Request is processed by consulting database, metadata documents and various configuration files. Then results are passed to web server which intern will be delivered to end user through web browser.

Alternate Text


Folder Structure

Folder structure in eprints can be divided into two categories, viz. Global configuration folders and Repository specific folders


1. Global configuration folders

These file are least likely to be changed, shared by multiple repositories and considered as read only but can be overridden at the local level.  Contain following sub folders with respective roles
  • lib: contains many sub folders and files which are responsible for global configuration of all repositories on single installation
  • archives: contains repository specific configurations for each repositories
  • bin & cgi: these two are directories for storing programs
  • perl_lib: holds all the modules required by PERL scripting language
  • cfg: holds information about apache (web server) configuration 
  • var: all temporary files used during the process are stored in this folder
  • testdata:  contains test data for populating repository

2. Repository specific folders


Each repository will have its own set of sub folder and files under the top level archives folder. These are often changed to customize each specific repository as per the user needs. All repository specific sub folders will be residing ateprints3/archives/*******/ (****** is the folder name of specific repository).
  • cfg: contains number of sub folders which are responsible for the entire configurations of specific repository 
    • cfg.d: contains all configuration files of the repository (all PERL scripts related to repository)
    • citations: contains citation definitions for the documents of the repository 
    • lang: language specific files for this repository (phrases, static pages and images, site template)
    • namedsets: contain lists of values for named set fields
    • static: contains pages and images in the repository
    • workflows: contains workflow configurations files for eprint and user
    • autocomplete: contains autocomplete datafiles
  • documents: all fulltext file in various formats (pdf, doc. ppt) are stored
  • html: contains processed static web pages (index, polices, etc).
  • var: contains temporary files of the repository 

Alternate Text



Eprints Customization

Eprints can be customized and localized at different levels, it enables administrators to change Look and feel (branding), adding new metadata fields, new document types, views, browse and search options. This can be achieved through web interface provided for administrator or by editing source files and reloading entire configuration.  
1. Look and Feel (Branding): simple way to achieve is login as an administrator, click on 'Home' to view the home static page (default home page), context sensitive menu tool named 'Edit page’ will appear in the toolbar. By clicking this button user can download index.xpage as html encapsulated file and incorporate necessary customization as per their requirement or same can be achieved in browsers itself. Similarly logo for the repository can also be uploaded from the screen. Once customization task is completed configuration needs to be reloaded by clicking Reload configuration button present at the top of the page. Same can be achieved by editing respective source files.
Alternate Text

2. Metadata Customization: New metadata fields can be added by clicking on Admin/config.Tools/ Manage Metadata Fields button and then by selecting Eprints dataset fields. The process of adding metadata has four stages viz. Type, Properties, Phrases and Commit. User can set various properties like, mandatory field, length, null values, etc., for new metadata filed.

Alternate Text


3. Document Type: Eprints default installation comes with various document types namely Article, Book, Book-Section, Conference Item, Monograph, Patent, Thesis, Artefact, Exhibition, Composition, Performance, Image, Video, Audio, Dataset, Experiment, Teaching Resource and Other. Apart from these users can add their own document types by adding them at namedsets/eprint and lang/en/phrases/local.xml source files. Workflow and citation style for the newly added document type can also be customized at  workflows/eprint/default.xml andcitations/eprint/default.xml. New document type is added in the following flow chart for your reference.
Alternate Text

4. Subjects: New subject tree can be added or modified with administration privileges at Admin/config.Tools/ Edit Subjects.


Alternate Text

5. Browse Views: Users can customize their browse views by calling required metadata in repository specific views configuration file at /eprints3/archives/nal/cfg/cfg.d/view.pl. New journal view is added in the flowing flowchart for eprints collection, user can build view as per his requirements by calling appropriate metadata element in views.pl file.

Alternate Text
Other than above customization choices, eprints empowers repository administrators to incorporate few more customizations like workflow, controlled vocabularies, renaming phrase, search page etc. 


Eprints Example websites

Eprints digital repository (Digital library) software is used to create diverse repositories such as Research, Theses, Data, Project, Political, Subject-based etc. Following table lists one example of each type of repository which has been customized uniquely based on their document types.

Alternate Text
Alternate Text
Alternate Text

Summary

The module introduces to the world of digital archiving, need for archiving technologies and Eprints in detail. Large number of repositories has been established using Eprints across the world. Its capabilities, reliability, good documentations and supports from developing team and user community has made it among one of the popular open source software for digital archiving. Eprints configuration and workflow was discussed in detail along with various hardware and software requirements for Installation. Unit highlights various features of eprints software so that user can fully aware of capabilities of the tool. Understanding folder structure in eprints is very essential which has been explained in details in the module. Customizing archive installation has been explained with screenshots and flowcharts so that it can be achieved easily. Various popular archives of different document types were discussed to give broader overview of Eprints capabilities. 


References

http://www.eprints.org/
http://wiki.eprints.org/w/EPrints_Manual
http://www.eprints.org/software/training/
http://www.dpconline.org/pages/handbook/docs/DPCHandbookDigPres.pdf
http://www.digitalpreservation.gov/documents/ebookpdf_march18.pdf

Did you know?

As the first professional software platform for building high quality OAI-compliant repositories, EPrints is already established as the easiest and fastest way to set up repositories of open access research literature, scientific data, theses, reports and multimedia. 
http://www.eprints.org/software/


EPrints 3 is "a significant milestone towards ideal repository software"
http://www.eprints.org/software/



EPrints 3 is a major leap forward in functionality, giving even more control and flexibility to repository managers, depositors, researchers and technical administrators.
 http://www.eprints.org/software/

EPrints is quick to install, easy to configure, and needs minimal maintenance. Once installed, it simply works without fuss. There simply isn't a contest.                        
 http://www.eprints.org/news/features/worlds_best_practice.php


World's Best Practice for an institution commencing an institutional repository.
http://www.eprints.org/news/features/worlds_best_practice.php


Interesting Facts

EPrints is a free and open-source software package originally developed by researchers at the University of Southampton School of Electronics and Computer Science in 2000.
Eprints (www.eprints.org) is mature and well supported software written by the Edimburgh University, to help self archiving and open access publishing.
E Prints is highly configurable to achieve diverse needs.
EPrints is capable of using a controlled vocabulary and authority lists, which can help ensure high metadata quality. It provides native support for Dublin Core with the possibility of exporting to a number of formats (e.g., METS, MODS and DIDL).


Glossary

Starting Character
Term
Definition
E
Eprints 
preprint in digital format, distributed electronically. The use of e-print servers to provide access to collections of preprints is a comparatively new mode of scholarly communication, developed in the physical sciences to circumvent the delays and high cost of commercialpublishing. One of the earliest and best-known e-printrepositories was created at the Los Alamos National Laboratory in New Mexico. The Open Archives Initiative(OAI) aims to facilitate the retrieval of scholarly papers from disparate digital archives. Also spelled eprint.

I
Institutional repositories
A set of services offered by a university or group of universities to members of its community for the management and dissemination of scholarly materials indigital format created by the institution and its community members, such as e-prints, technical reports, theses anddissertations, data sets, and teaching materials. Some IRs are also used as electronic presses to publish e-journals and e-books. An institutional repository is distinguished from a subject-based repository by its institutionally defined scope. IRs are part of a growing effort to reform scholarly communication and break the monopoly of journal publishers by reasserting institutional control over the results of scholarship. An IR may also serve as an indicator of the scope and extent of the university's research activities.

M
Metadata
Literally, "data about data." Structured informationdescribing information resources/objects for a variety of purposes. Although AACR2/MARC cataloging is formally metadata, the term is generally used in the librarycommunity for nontraditional schemes such as theDublin Core Metadata Element Set, the VRA Core Categories, and the Encoded Archival Description (EAD). Metadata has been categorized as descriptive,structural, and administrative. Descriptive metadata facilitates indexingdiscovery, identification, and selection. Structural metadata describes the internal structure of complex information resources. Administrative metadata aids in the management of resources and may include rights management metadata, preservation metadata, and technical metadata describing the physical characteristics of a resource.
O
digital archive created and maintained to provide universal and free access to information content in easily read electronic format as a means of facilitating researchand scholarship. A prime example is PubMed Central(PMC), a project of National Center for Biotechnology Information at the U.S. National Library of Medicine, designed to provide open access to the journal literatureof the life sciences.
O
Open Archives Initiative (OAI)
An organization funded by the Digital Library Federation, the Coalition for Networked Information, and the National Science Foundation to develop and promoteinteroperability standards as a means of facilitating the exchange of digital information content. Its program originated in the desire to advance scholarly communication by improving access to distributedrepositories of e-prints, known as "archives." The main product of the OAI is a framework for harvesting and aggregating metadata from multiple repositories and a harvesting protocol known as the OAI Protocol for Metadata Harvesting (OAI-PMH).


Web links




www.eprints.org/software/
http://www.eprints.org/software/training/configuration/biodiversit
y.pdf
http://www.eprints.org/news/features/worlds_best_practice.php
http://circle.ubc.ca/bitstream/handle/2429/44812/Castagne_Michel_LIBR596_IR_comparison_2013.pdf?sequence=1

Points to Ponder

  1. DSpace, EPrints, Digital Commons and Fedora Commons were selected based on their ROAR statistics 1 and overall suitability for a large research library.
  2. EPrints Services offers customization, training and support services. The Eprints-tech mailing list is fairly active, and documentation and training materials are available. The EPrints community seems to be concentrated in Europe, specifically the UK
  3. It is possible to add files in any format, but customization is required to extend EPrints to support research datasets. Batch importing can be challenging and requires some knowledge of Perl scripting.

No comments: