Wednesday, February 11, 2015

06. ISAR Systems: Functions and Design P- 06. Information Storage and Retrieval

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com

06. ISAR Systems: Functions and Design


P- 06. Information Storage and Retrieval

By :Dr P.M Devika ,Paper Coordinator


Home
 Content
    Introduction
    1. User Interface System
    2. Query Processing System
    3. Database Modeling System
  Collapse  4. Sampling of Information Retrieval Systems(OPAC, Dialog, GOOGLE, EBSCO, PubMed)
     OPAC
     Dialog 
    DialogWeb Search Modes 
    Guided Search Mode
    Command Search Mode
    Google
    Google Advanced Search
    EBSCO
    PubMed 

Introduction

Information Search And Retrieval is a system which allow end users to communicate with the system. Every one will use the ISAR system in a different way. Each user will have different searching skills and knowledge of using any ISAR system. 

1. User Interface System

  1. An user interface is something that connects between two entities. Every computer system that interacts with users has an user interface.  This is the combination of hardware and software that enables the users to use the system. It is the user interface that makes the system most used. The user interface forms an important component of an information retrieval system since it connects the users to the organized information resources.

In ISAR systems retrieval is very much important at the same time how user communicates with the system as well is important. Hence user interface plays an important role in any ISAR system. User interface performs two major functions: they allow users to search or browse an information collection and they display the results of a search. They also allow users to perform further tasks, like sorting, saving and/or printing the search results, modifying the search query, and so on. The user interface is therefore the most important component of an information retrieval system that a user can see and interact with. The success of an information retrieval system depends significantly on the design usefulness of the user interface. User interface is the means by which information is transferred between the user and the computer and vice-versa. Well designed user interfaces allow users to find the information that the information system provides access to easily and to exploit it once found.

 Some user interfaces are very interactive, some expect users to type certain commands, some interfaces use extensive graphic images so that it is easy for end user to navigate from one option to other.

Having a user interface for running any queries provide effective retrieval.  Use of an information system typically begins with recognizing on sensing the existence of an ASK, anomalous state of, or lack of knowledge. It can take many forms, from an explicit recognition that a specific fact is needed to a general desire to find out what is new or interesting. While designing any user interface the following guidelines are proposed for design of any user interface:

  • Strive for consistency in terminology, layout, instructions, fonts and colour
  • Provide shortcuts for skilled users.
  • Provide appropriate and informative feedback about the sources and what is being searched for.
  • Design for closure so that users know when they have completed searching the entire collection or have viewed every item in a browser list.
  • Permit reversal of actions so that users can undo or modify actions;for example, they should be able to modify their queries or go back to the previous state in a search session.
  • Support user control, allowing users to monitor the progress of a search and be able to specify the parameters to control a search.
  • Reduce short-term memory load;the system should keep track of some important actions performed by the users and allow them to jump easily to a formerly performed action.
  • Simple error-handling facilities to allow users to rectify errors easily;all error messages should be clear and specific.
  • Provide plenty of space for entering text in search boxes.
  • Provide alternative interfaces for expert and novice users.

2. Query Processing System

The database user retrieves data by formulating a query in the data manipulation language provided with the database. The query processor is used to interpret the online user’s query and convert it into an efficient series of operations in a form capable of being sent to the data manager for execution. The query processor uses the data dictionary to find the structure of the relevant portion
of the database and uses this information in modifying the query and preparing an optimal plan to access the database.


3. Database Modeling System

  1. Data modeling is a process by which the data requirements of an organisation or an application area are represented. Data exists as facts, figures and other bits of knowledge. When lot of data items are put together in a useful form one can get information. A data model is a schema to represent the real world using information concepts and structures.  A model of data is a particular type of structure or manner of visualizing a data structure. A data structure is a collection of data elements or objects and relationships among them. A data model is a plan for building a database. It is a description of the data about entities, events, activities and their associations within an organization. The purpose of data model is to represent data which should be understandable. It is only then it can be used in some application. Modeling gives consideration only to the constituent elements and sequencing or placement of data elements within other elements. A data model in IRS is primarily a descriptive structure of data, not necessarily its meaning. Before 1980s, the two most commonly used database models were hierarchical and network systems. There are various data models now in use such as
      Linear Sequential Model
      Hierarchical Model
      Network Model
      Entity Relationship (E-R) Model
      Relational Model
      Object Oriented Model


3.1              Linear Sequential Model : It is a very common data structure, also called as a flat structure. It is simply a list or table of elements, with no hierarchical structure other than the accumulation of records within the file – a straight line, no branching. An example is the list of students registered for a given course – the class list. There may be no information except the STUDENT NUMBER and NAME in this type of sequential model.
3.2              The Hierarchical Model : This data model uses tree structures to represent relationship among records. For eg an institute has a number of programmes to offer. Each programme has a number of courses. Each course has a number of students in it. A hierarchical model consists of collection of records which are connected to each other through links. The tree type data structure is used to represent hierarchical data model and shows the relationships among the parents, children, cousins etc. The highest level of the tree is known as the root in hierarchical model. The tree has one truck and many branches emerging from it. The connection of branches is called a node. A tree is thus a collection of nodes. One node is designated as the root node and the remaining nodes form trees or subtrees. In hierarchical database model, nodes represent entity sets. A hierarchical model maintains one-to-one or one-to-many relationships.
3.3              The Network Model : It is designed to solve some of the problems with the hierarchical model. A network model is very similar to the hierarchical model. In network model, relationships are represented in terms of sets rather than hierarchy. This allows the network model to support many-to-many relationships and also solved the problem of data redundancy.

3.4              Relational Model : In relational model all data is expressed in terms of tables and nothing but tables. Therefore, all entities and attributes are expressed in rows and columns.  The design of the relational data model is based on mathematical set theory. The entities are represented in the form of a table, called a relation. The relational database consists of three parts : 1. the structural part defines relations of data and their inter-relation ; 2. the integrity part ensures that each occurrence of a relation is unique and 3. the manipulative part provides operators for processing relations. Relation consists of columns and rows. An entity is represented by a single row (tuple) in a table. And the column part of the relation refers to the entity attributes. In order to achieve the organization of the data in tables in a satisfactory manner, a technique called normalization is used. Normalization is a technique that helps in determining the most appropriate grouping of data items into records, segments or tuples.

Each occurrence of the entity type in the relation must be uniquely identified. An attribute that has no repeated value on different tuples in the relation is considered as an entity identifier. An attribute used for uniquely distinguishing each entity in the relation is a primary key. The primary key is usually positioned in the first column of the relation and must have a unique value. The primary key plays a ruling role in the relational database model. When the primary key of one relation appears in another relation, it is called a foreign key. This key is useful for navigating from one relation to the other relation.

Entity-Relationship Model

The Entity-Relationship (ER) model was originally developed by Chen in 1976. It unifies the network and relational database views. It is a conceptual data model that views the real world as entitities and relationships. In this model an Entity-Relationship diagram is used to visually represent data concepts. Thus, it provides graphic representation of entities, attributes and relationship. Today this model is commonly used for database design. The entity-relationship diagram for data uses three features to describe data such as entitities, relationships and attributes. Entitities are concepts, real or abstract, about which information is collected. Relationships are associations between entitites. Attributes are properties which describe the entitites. E-R model can easily be transformed into relational tables. It is simple and easy to understand with a minimum training. Therefore, the model can be used by the database designer to communication the design to the end user.

Object-Oriented Model : The object-oriented model facilitates handling of objects rather than records. In an ojbect-oriented model an entity is represented as an instance (object) of a class that has a set of properties and operations (methods) applied to the object. A class represents an abstract data type and is a shell from which we can generate as many copies (called instances) as we want. In object oriented approach, the behaviour of an object is a part of its definition. The behaviour is described by a set of methods. The set of methods offered by an object to the others defines the object interface. A class and hence an object may inherit properties and methods from related classes. Objects and classes are dynamic and can be created at any time. Viewing the data as objects instead of records provides more flexibility.

OPAC

Online Public Access Catalogues were previously termed as online catalogs which first came into existence for on-site use in libraries during 1970s and it reached in most of the western libraries during mid-1980s. Online catalogs brought significant qualitative improvements in access to library resources. Even though the content and structure of the records was little changed from card catalogs, online catalogs provided new searching capabilities such as keyword access, boolean logic, limits search by date and type of materials. By combining circulation, acquisitions and cataloguing data, OPAC changed showing the record of materials along with other indicators such as who owns the books, whether book is available or in circulation and on order status of any title.

Currently OPACs are termed as Web OPACs. Web OPACs today offer a wide range of search options. They may incorporate information retrieval techniques such as word stemming, truncation, weighted searching, use of fuzzy match search logic, natural language processing. Web OPACs provide enriched subject access or enhanced content. Today's Web OPACs provide automatic spelling correction of common terms. They frequently provide ability for a reader to save searches via email. They also provide self-service features, such as reader-initiated reservations, renewals, document ordering etc. The interfaces may incorporate extensive search limiting or browsing features. Today's OPACs also offer access to the catalog. Web OPACs have obvious major advantages such as that: the user is offered access via the browser, integrating the OPAC with other information sources ; using the USMARC field 856 it is possible to include URLs within the bibliographic database, creating live links to digital objects or enabling the association of print and digital sources within a bibliographic record, search types can be customized,

Dialog

Knight-Ridder Information Inc's Dailog is one of the oldest online retrieval system. It was completed in 1966. According to its literature it was “the world's first online information retrieval system to be used globally with materially significant databases”. Dialog provides over 500 databases, ranging across most disciplines and bibliographic, abstract, and full text format. It has most comprehensive content collection and most powerful search language available.From concept testing, to clinical trials, to product launches, to patent protection, Dialog delivers accurate, relevant  results with excellent speed.

Dialog is unique in the vast array of information covering virtually every subject. More than 30,000 separate serial publications are indexed on Dialog, and more than 8,000 of these are in fulltext. In addition, Dialog includes the fulltext of many reference works and specialty publications from around the world, such as market research and brokerage house reports; patents and trademark registrations; chemical directories and drug pipeline monitoring services; to name a few. Archival data is available for many sources back to the 1960s and 1950s some  even back to the 18th century. Information on Dialog is organized into separate databases, each like a “mini library” of specific information or publications. Learning to identify the best database for a search is a skill.

Although menu-based searching is available in Dailog, the primary search style is Boolean, offering truncation, field searching, limits by categories such as language or document type,MAP and RANK commands. 

 For many years dialog system offered users searching based on only one query logic such as Boolean based searching. Its primary users initially were only library community members and other professional searchers. Over the years dialog has provided variations in its services for different classes of users with different skill sets. Dialog's basic language allow to use six commands such as BEGIN, EXPAND, SELECT, DISPLAY, PRINT and LOGOFF. Dailog also allow users to enter a “natural language” query. The natural language query is converted into boolean expression with common words ommitted and the remaining words ORed together. The retrieved set is then ranked according to the number of query terms found.

Dialog has also introduced web version of its service DAILOGWeb offers Internet access to the regular Dialog search system. Dialog provides use of sophisticated search engine and authoritative databases. DialogWeb provides easy access to the full content (over 500 databases), power, and precision of Dialog through a Web browser. Special features include:

  • A flexible and easy to use Guided Search mode that does not require knowledge of the
  • Dialog command language
  • A robust Command Search mode that uses the powerful Dialog command language
  • experienced searchers can easily use
  • Database selection tools to help pinpoint the right database for your search
  • Integrated database descriptions, pricing information, and other search assistance
  • Easy to use forms to create and modify Alerts (current awareness updates)
  • Search results available in HTML or text formats
A choice of displaying records or send ing search results via email, fax or postal delivery

Guided Search Mode

Guided Search is designed for novice to intermediate searchers who want easy access to Dialog’s authoritative business, legal, scientific, intellectual property, and technical information.  Guided Search is the default search option for all new DialogWeb customers. Use Guided Search when you do not know Dialog commands and/or databases, or when you are trying out a new subject area and are not familiar with  the databases or search terminology. It is  also useful if you need an answer to a frequently asked question.

The first page of the Dialog guided search  is a broad subject category display. Clicking on one such category results in more detailed display. 

Command Search Mode

Command Search is designed for intermediate to experienced Dialog searchers. It provides complete command based access to the extensive collection of Dialog databases. You need to be familiar with Dialog commands when using Command Search.

Google

Google is the most popular web search engine which was developed in 1998. Google navigates the web by crawling that means follow links from page to page, Pages are then sorted out by their contents and other factors, programs are written to deliver best results, algorithms (get to work looking for clues to better understand what you mean), based on the clues relevant documents are pulled up from index, then results are ranked, using over 200 factors, Algorithms are constantly changing. Data is updated every 1/8th of a second in Google database.

Google opening page is very simple consisting of just a query box and links to advanced search section. Google allow to set search preferences based on images, maps, videos, books as well as it allow to restrict searches based on country, time and place. Google indexes word by word from all web pages available on Internet. While searching for information on Google, google carry out spell checking and provides a suggestion to the end user. Google also does stemming on entered terms. Google supports basic as well as advance search feature facilities such as basic AND, OR and NOT features are provided with Google. Google also allow to use advance search features based on phrase, wild card, stemming algorithm, fuzzy searching, proximity searching, must include, must exclude terms, range searching etc.

While creating indexes of every word available from web pages google also takes snapshot of every page that it has indexed. It is called as “cache” page. Google keeps backup of every page it has indexed as a cache page. If end user clicks on “Cached” link Google will take you to the web page as it looked when it was indexed.

Google Advanced Search

By clicking on advanced search one reaches the advance search page athttp://www.google.com/advanced_search . On Google advanced search page one can find more than one box which will retrieve pages with the words entered into several advance search boxes.

EBSCO

EBSCOhost is an online system that provides access to several periodical indexes or databases. It is a powerful online reference system accessible via the Internet or direct connection. It offers a variety of proprietary full text databases and popular databases from leading information providers. It provides a complete and optimized research solution comprised of research databases, e-books and e-journals—all combined with the most powerful discovery service and management resources to support the information and collection development needs of libraries and other institutions and to maximize the search experience for researchers and other end users. Currently EBSCO offers more than 375 full-text and secondary research databases and over 5,15,000 e-books plus subscription management services for 360,000 e-journals, e-journal packages and print journals.

Hospitality & Tourism Complete Database covers scholarly research and industry news relating to all areas of hospitality and tourism. This collection contains full text for more than 500 publications, including periodicals, company & country reports, and books.

Business Source Premier provides articles from over 2500 journals, magazines and newspapers on
business and  management to pics.

EBSCO serves the content needs of all researchers whether they access EBSCO resources via academic institutions, schools, public libraries, hospitals and medical institutions, corporations, associations, government institutions, etc. It has a powerful web-based retrieval system. As soon as a user logs in to the system, user can select the databae and then click on search option.
The initial search screen presents a toolbar which includes functions that are available at all times during a search session. These include buttons for “new search” which will return to the initial default search screen; “view folder” which allows the user to view a personal folder which is cleared at session end unless the user signs into the system establishing a permanent file; “preferences” which permits a change in the Result List Format and number of results per page; “help”  which opens an online help manual; plus an “exit” or “home library graphic” which will close EBSCO host and return to the library's home page.
The basic search screen supplies a Find box, in which terms may be entered and automatically checked for commonly misspelled words and alternate spellings suggested. Keywords are the assumed default. Ebsco host allow to use basic search as well as advance search option. 

PubMed

PubMed was first released in January 1996 as an experimental database under the Entrez retrieval system with full access to MEDLINE®. The word "experimental" was dropped from the Web site in April 1997, and on June 26, 1997, a Capitol Hill Press conference officially announced free MEDLINE access via PubMed. PubMed searches were approximately two million for the month of June 1997 while current usage typically exceeds three million searches per day.

PubMed is a free access web-based retrieval system developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine. It contains bibliographic information drawn primarily from the life sciences literature, including the over 10 million citations in medline and premedline. The database provides links to full-text articles. The PubMed database as well allow to carry out basic and advance search. The first search page of PubMed is very simple and easy to use. To search PubMed, one enters search terms in the query box. If user wishes to use advance search he is also allowed to use either basic or advance search feature. 

No comments: