Saturday, December 6, 2014

11. Digitization Part -II

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं -

11. Digitization Part -II

P- 01. Digital Libraries*

By :Jagdish Arora, Paper Coordinator


  • To know different types of digitisation equipments
  • To know different software used in digitisation process
  • To learn digitisation process for audio-video contents
  • To learn digitisation process for images

1.1 Scanners

Digital scanners are used to capture a digital image from an analogue media such as printed page or a microfiche / microfilm at a predefined resolution and dynamic range (bit range).  There are two types of image scanners: vector scanner and raster scanners. The vector scanners scan an image as a complex set of x,y coordinates.  Vector images are generally used in geographical information systems (GIS).  The display software for the vector image interprets the image as function of coordinates and other included information to produce an electronic replica of the original drawing or photograph.  Vector images can be zoomed in portion to display minute details of a drawing or a map.  Maps, engineering drawings, and architectural blueprints are often scanned as vector images. Raster images are captured by raster scanners by passing lights (laser in some cases) down the page and digitally encoding it row by row.  Multiple passes of lights may be required to capture basic (as a set of bits known as bit map) colours in a coloured image. Raster scanners are used in libraries to convert printed publications into electronic forms. Majority of electronic imaging system generate raster images. The scanners used for digitizing analogue images into digital images come in a variety of shapes and sizes. 

How Scanner Works?

Scanners are equipped with a lamp that moves with the scanner head to light-up the object being scanned.  Most scanners use a cold cathode florescent lamp or a xenon lamp.  The scan head is made up of the mirrors, lens, filter, and charge-coupled devices (CCD) array.  A belt that is connected to the stepper motor makes the scan head move.  A stabilizing bar prevents wobbling during the pass.  The mirrors reflect what is being scanned into the lens and the image is then focused through a filter on the CCD array.  Three smaller images of the original are made by the lens.  These images then go through a color filter and onto a section of the CCD array.  The data is then combined into a single image.

While selecting a scanner, one should consider resolution, sharpness, and rate of image transfer.  The resolution is measured in dots per inch (dpi).  The average scanner has at least 300x300 dpi.  The number or sensors in a row of the CCD array determines a scanners dpi.  Sharpness depends on how bright the lamp is and the quality of the lens.  Image transfer depends on the connection used to connect the scanner to the computer.  The slowest is the parallel port.  Universal Serial Bus or USB scanners are affordable, easy to use, and have decent speed. 

The hardware required for a scanner is a connector such as a USB.  The software required is a driver.  The driver is needed to communicate with the scanner.  TWAIN is the language spoken by scanners.  Any program that supports TWAIN can acquire a scanned image.   

There are following types of Scanners:
1.1.1        Flatbed Scanners – right angle, prism and overhead flatbed
1.1.2        Sheet-Feed Scanners
1.1.3        Drum Scanners
1.1.4        Digital Cameras
1.1.5        Slide Scanners
1.1.6        Microfilm Scanner
1.1.7        Video Frame Grabber
1.1.8        Hand-held scanners

The type of scanner selected for an imaging project would be influenced by the type, size and source of documents to be scanned.  Many scanners can handle only transparent material, whereas others can handle only reflective materials.

1.1.1 Flatbed Scanners:

photocopier and are used in much the some way. Source material in a flatbed scanner is placed face down for scanning.  The light source and charge-coupled devices (CCDs) move beneath the platen, while the document remains stationary as in the case of photocopying machine.  Flatbed scanner comes in various models like right-angle, prism and planetary/overhead to handle bound volumes and books. Flatbed scanner can scan a document at 600 dpi. Many flatbed scanners offer higher resolution.

                         Fig. 1: Flatbed Scanner

.1.2 Sheet feed Scanners

In a sheet-feed scanner, as is indicated in the name, document is fed over a stationary CCD and light source via roller, belt, drum, or vacuum transport. In contrast to a flat-bed scanner, sheet-feed scanner have optional attachment to auto feed uniform-sized stacks of documents to be scanned.

                                   Fig. 2: Sheet feed Scanners

1.1.3 Drum Scanners

Source material in a drum scanner is wrapped on a drum, which is then rotated past a high-intensity light source to capture the image.  Drum scanners offer superior image quality, but require flexible source material of limited sized that can be wrapped around the photosensitive drum. Drum scanners are specially targeted for graphic art market. Drum scanners offer highest resolution for grey scale and colour scanning. Drum Scanner use Photo-Multiplier (Vacuum) Tubes (PMTs) instead of CCDs, which offer a greater bit depth (12 to 16 bits).


                                            Fig. 3: Drum Scanners

1.1.4 Digital Cameras

Digital cameras mounted on copy cradle resemble microfilming stand.  Source material is placed on the stand and the camera is cranked up or down in order to focus the material within the field of view. Digital cameras are most promising scanner development for library and archival applications.

                                                             Fig. 4: Digital Camera

1.1.5 Slide Scanner

Slide scanners have a slot in the side to accommodate a 35mm slide.  Inside the box, the light passes through the slide to hit a CCD array behind the slide.  Slide scanners can generally scan only 35mm transparent source materials.

                                            Fig. 5: Drum Scanner

1.1.6 Microfilm Scanner

Specially targeted to library/archival application, microfilm scanners have adapters to convert roll film, fiche, and aperture cards in the same model.

                                                   Fig. 6: Microfilm Scanner

1.1.7 Video Frame Grabber or VideoDigitizer

Video digitizers are circuit boards placed inside a computer and attached to a standard video camera. Anything that is filmed by the video camera is digitized by the video digitizer.

                                                                Fig. 7: Video Graber

1.1.8 Hand-held Scanners

Hand-held scanners are used for scanning selective sections of data. It may require multiple pass to capture large area. Moreover, a user should have a steady hand while moving the scanner over the document to be scanned.

1.2 Scanning Software

The scanning software is used for scanning the image and capturing it in the computer. This software is provided by the manufacturer of the product to the buyers. These drivers translate the instructions into commands, which the scanner understands.

1.3 Image Editing Applications

Image editing applications are used once the process of scanning the image is over and the image is available in the computer for further manipulation. Most image editing software offer features like image editing, sharpening, filter, cropping, colour adjustments, forms conversion, resizing, etc. Most image editing software can also be used for capturing the images.

2.0 Digitisation of Audio and Video

The song or speeches that we generally listen from tape recorder or radio are in an analogue form. The analogue sound tracks can be digitized by attaching an audio player to a system through an audio capture card so as to record the sound to the system. The audio files can be saved as .wav, mp3, midi, etc. MP3 format is highly compact and the sound quality is better in comparison to other formats. Audio files can be further processed using noise reduction software.

Like audio, video capture also requires a video capture card with input from video cassette player (VCP / VCR), TV antenna, cable or movie camera. The digitised files can be saved as .mov, .avi, .mpg file formats. 

3. Organizing Digital Images

A disc full of  digital images without any organization, browse and search options may have no meaning except for one who created it. Scanned images need to be organized in order to be useful. Moreover, images need to be linked to the associated metadata to facilitate their browsing and searching. The following three steps describes the process of organizing the digital images:

3.1 Organize

Organize the scanned image files into disc hierarchy that logically maps the physical organization of the document. For example, in a project on scanning of journals, create a folder for each journal, which, in turn, may have folder for each volume scanned. Each volume, in turn, may have a subfolder for each issue. The folder for each issue, in turn, may contain scanned articles that appeared in the issue along with a content page, composed in HTML providing links to articles in that issue. 

3.2 Name

Name the scanned image files in a strictly controlled manner that reflects their logical relationship. For example, each article may be named after the surname of first author followed by a volume number and an issue number. For example, file name “smithrkv5n1.pdf” conveys that the article is by “R.K. Smith” that appeared in volume 5 and issue no.1. The file name for each article would, therefore, convey a logical and hierachial organization of the journal.

3.3 Describe

Describe the scanned images file internally using image header and externally using linked descriptive metadata files. The following  three types of metadata are associated with the digital objects:
i. Descriptive Metadata: Include content or bibliographic description consisting of keywords and subject descriptors.
 ii. Administrative or technical Metadata: Incorporates details on original source, date of creation, version of digital object, file format used, compression technology used, object relationship, etc. Administrative data may reside within or outside the digital object and is required for long-term collection management to ensure longevity of digital collection.
 iii. Structural Metadata: Elements within digital objects facilitates navigation, e.g. table of contents, index at issue level or volume level, page turning in an electronic book, etc
The simplest and least effective method for providing access is through a table of contents and links each item to its respective object / image. Content pages of issues of journals done in HTML would offer browsing facility. Full-text search to HTML pages or OCRed pages can be achieved by installing one of the free Internet search engines Google (

Large scanning projects would, however, require a back-end database storing images or links to the images and metadata (descriptive / administrative). Back-end database used by most document management system holds the functionality required by most web applications. Important management systems like File Net have now integrated their database with HTML conversion tools. Further, some of the document management system have also signed up with Adobe to incorporate Acrobat and Acrobat capture into their web-based document management system.These databases entertain queries from users through “HTML forms” and generate search results on the fly. Several digital library packages are now available as “open source” or “free-ware” that can be used not only for organzing the digital objects but also for their search and retrieval.

4.0 Planning Digitisations

Digitisation is the first step towards building a digital library. It is highly specialized and cost-intensive activity that requires inputs from diverse branches of knowledge. It is important that objectives, needs and purpose of digitisation is established clearly and its objectives are established beyond doubts. The digitisation proposal should define its goals, scopes, benefits,costs, time required in developmental phase, feasibility, implementation issues, deliverables and target users. It may be desirable to continue with traditional libraries with i) acquiring of collections in digital media; ii) buying access to electronic resources; and iii) developing subject gateways or library portals, instead of undertaking digitisation project. This may be studied in more detail through module named “Digital Library Planning and Implementation”.

5.0 Summary

Digitisation is the process of converting the content of physical media (e.g., periodical articles, books, manuscripts, cards, photographs, vinyl disks, etc.) into digital format.  In most library applications, digitisation normally results in a documents that are accessible from the web site of a library, and thus are available on the Internet. Optical scanners and digital cameras are used to digitise images by translating them into bit maps. It is also possible to digitise sound, video, graphics, animations, etc.

An image scanning system may consists of a stand-alone workstation where most or all the work is done on the same workstation or as a part of a network of workstation with imaging work being distributed and shared amongst various workstations.  The requirement usually includes a scanning station, a server and one or more editing and retrieval stations.
This module describes scanners and scanning software as the important components of the scanning system.


Coyle, K. (2006). Mass digitization of books. The Journal of Academic Librarianship, 32(6), 641-645.

Hughes, L. M. (2004). Digitizing collections: strategic issues for the information manager.

Lee, T. H., & Sheng, T. (2009). U.S. Patent No. 7,538,915. Washington, DC: U.S. Patent and Trademark Office.

Rangan, P. V., & Vin, H. M. (1991). Designing file systems for digital video and audio (Vol. 25, No. 5, pp. 81-94). ACM.

Vannier, M. W., Pilgram, T., Bhatia, G., Brunsden, B., & Commean, P. (1991). Facial surface scanner. IEEE Computer Graphics and Applications, 11(6), 72-80.


Starting Character
Related Term
Archival storage
Archival storage is storage for data that may not be actively needed but is kept for possible future use or for record-keeping purposes. Archival storage is often provided using the same system as that used for backup storage. Typically, archival and backup storage can be retrieved using a restore process.
Born Digital
 Digital materials which are not intended to have an analogue equivalent, either as the originating source or as a result of conversion to analogue form.This term has been used in the handbook to differentiate them from 1) digital materials which have been created as a result of converting analogue originals; and 2) digital materials, which may have originated from a digital source but have been printed to paper, e.g. some electronic records.
Refers to information processing techniques that convert the actual data into binary (or machine language) code for efficient transmission and storage.
Digital Library
Digital libraries are organized and structured access to information contents in a distributed environment and assist users in searching, evaluating and utilizing resources in different digital formats.
Digital Object Production
The process by which the content file(s) and corresponding metadata are united in the digital wrapper, i.e., MoA II; XML DTD, or METS. The process may be accomplished manually, or it may be automated to increasing degrees using spreadsheets and database applications.
Digitization is the process or series of software programs
used to make a representation of an object, an image, or a
signal (when dealing with audio) by a discrete set of its
points or samples. The result is usually called a digital
image for the object, and digital form for the signal.
Digital Library Federation (DLF)
The Digital Library Federation,  is an association of libraries and allied institutions taht work together to establish an international network of digital libraries. Its aim is to promote strategies for collection development, identify best-practices and standards for the production of electronic-information technologies as well as providing practical initiatives for the preservation of digital collections.
Digital Object
Digital Object is a term used to describe item(s) stored in a digital library. A digital object is a content-independent data structure principally composed of digital material or data as well as metadata (such as policy expressions dictating use).
Electronic Records
Records created digitally in the day-to-day business of the organisation and assigned formal status by the organisation.They may include for example, word processing documents, emails, databases, or intranet web pages.
Information Retrieval System (IRS)
Techniques and process of storing, searching and retrieving records stored in computerized databases. It includes database design and implementation.
Intellectual property rights
A right that is had by a person or by a company to have exclusive rights to use its own plans, ideas, or other intangible assets without the worry of competition, at least for a specific period of time.
OCR (optical character recognition)
is the recognition of printed or written text characters by a computer. This involves photoscanning of the text character-by-character, analysis of the scanned-in image, and then translation of the character image into character codes, such as ASCII, commonly used in data processing.
Digital Preservation
Digital preservation is defined very broadly for the purposes of this study and refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological change.Those materials may be records created during the day-to-day business of an organisation;"born-digital" materials created for a specific purpose
The process of reviewing the final draft of a text to ensure that all information is accurate and all surface errors have been corrected.
Quality assessment
An evaluation of the extent to which a trial’s design and management are likely to have prevented systematic errors and biases. Variations in quality often explain differing results in trials asking the same question, when examined under the systematic review (meta-analytical “microscope”). More rigorously designed trials are more likely to yield results that are closer to the “truth”
A place where something is stored, e.g. a site with all publications originating from Wageningen UR
A standardized collection of information in computerized format, searchable by various parameters; in libraries often refers to online catalogs and bibliograpies
Copying information content from one storage medium to a different storage medium (media reformatting) or converting from one file format to a different file format (file re-formatting).


No comments: