Friday, February 13, 2015

14. Advanced Course in Information Storage and Retrieval III : Semantic Web P- 06. Information Storage and Retrieval

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com

14. Advanced Course in Information Storage and Retrieval III : Semantic Web


P- 06. Information Storage and Retrieval

By :Dr P.M Devika ,Paper Coordinator


1. Introduction


The web is a success story both in terms of availability of information and increasing number of users [4]. Today people use the Web for various purposes including for knowledge acquisition, sharing thoughts, business and entertainment. However, we all are aware about the fact that in most the cases the retrieved results, from the Web of documents using the present search engines, like, Google, Yahoo, etc., are highly irrelevant and noisy. The problem is most of the information available on the Web are for human consumption and for human interpretation, not for machine to consume, interpret and process. The search engines do not really understand our query, the meaning of the query. They process and execute our query considering the query as a set of strings, which they match against their indices and retrieve the results accordingly.

Semantic Web, an extension of the present Web, is characterized by associating the machine accessible formal semantics with the Web content. The motivation behind Semantic Web is to automatize the processing and execution of Web of information and to improve interoperability among the Web based information systems. The goal is to retrieve meaningful and relevant information to the users.

In this module we discuss Semantic Web (SW) techniques and technologies. We also discuss the potential uses of Natural Language Processing (NLP) in Semantic Web (see module no. 12 for discussion on NLP).

2. Semantic Web

Tim Berners-Lee, who is the inventor of the World Wide Web (WWW), first envisioned Semantic Web (SW) that provides automated information access based on machine-processable semantics of data. The SW is an extension of the current Web in which information is given well-defined meanings to enable computers and people to work in cooperation [2]. Antoniou et al [3] defined SW a vision of the next generation web, which enables Web applications to automatically collect Web documents from diverse sources, integrate and process information and interoperate with other applications in order to execute sophisticated tasks for humans. The aim of SW is to develop languages for expressing information in a machine processable way.  The explicit representation of the semantics of data, accompanied with domain theories (i.e. ontologies), will enable the web to provide qualitatively new level of services [5].

Furthermore, semantic technologies and techniques are to allow machines to process logically connected data on the Web automatically and infer new information. Through a rich knowledge representation model, such as, Resource Description Framework (RDF), Semantic Web provides a highly structured data. It is now possible for application developers to share their rich structured data on the Web, and software agents can infer knowledge based upon the different kinds of structured and logically connected data available on the Web. It is important to mention that RDF is built on the elementary pointer mechanism, Universal Resource Identifier (URI) (discussed in details in the following Sections). We know in traditional Web, URI is mainly used to refer the documents and its parts through the hypertext mechanism. But the emerging Semantic Web shows a new face of it by using it to name anything, starting from abstract concepts color, test, dream, etc. to physical object personlocation, mountain, etc. to electronic objects (aka information object)home page of an institution.  RDF is also used to name the relationships between objects as well as the objects themselves [8].

In the following sections we discuss semantic techniques and technologies.  

3. Semantic Web Components


Figure 1 shows the semantic web technology stack that describes the semantic web design and vision. It is built on layered structure. The goal of the layered structure is to implement the semantic web vision step by step. The pragmatic justification of it is that it is easier to achieve consensus on small steps, whereas it is much harder to get everyone on board if too much is attempted [3] [9]. It is also because that to achieve the vision of semantic web, it is not mandatory to implement the entire semantic web technology stack. Instead the decision of implementing the technologies would be guided by the overall system objective.

Alternate Text
In building the semantic Web in a layered manner, two principles should be followed:  
  1. Downward Compatibility: agents (agents are pieces of software that work autonomously and proactively [3] fully aware of one layer should also be able to interpret and use information written at lower levels. E.g. agents aware of the semantics of OWL can take full advantage of information written in RDF and RDF Schema.  
  2. Upward Partial Understanding: agents fully aware of one layer should also be able to take at least partial advantage of information at higher levels. E.g. an agent aware of only RDF and RDF Schema semantics can interpret partial knowledge written in OWL, by disregarding those elements that go beyond RDF and RDF Schema.

3.1.1  Extensible Markup Language (XML)
We have seen in Figure 1, at the bottom of the Semantic Web layer is XML (eXtensible Markup Language) and XML Schema. XML is a subset of Standard Generalized Markup Language (SGML). XML lets everyone create their own tags such as those that are used to annotate Web pages of sections of text on a page. But it says nothing about what the structures mean. XML, in particularly, is suitable for sending documents across the Web. 

3.1.1.1  Salient Features of XML
Some of the salient features of XML are [3],  
  1. Extensible: tags can be defined; can be extended to lots of different applications.


  1. Machine accessibility: XML document is more easily accessible to machines because every piece of information is described. Moreover, their relations are also defined through the nesting structure. For example, the <author> tags appear within the <book> tags, so they describe properties of the particular book. A machine processing the XML document would be able to deduce that the author element refers to the enclosing books element, rather than having to infer this fact from proximity considerations, as in HTML. 
  2. Separates content from formatting: same information can be displayed in different ways, without requiring multiple copies of the same content; moreover, the content may be used for purposes other than display. 
  3. A meta-language for markup: it does not have a fixed set of tags but allow users to define tags of their own.

 

3.1.1.2  Issues with XML

XML is a universal meta-language for defining markup. It provides a uniform framework, and a set of tools like parsers, for interchange of data and metadata between applications. But it has also some limitations like [7],

  1. XML does not ensure standard vocabulary and subject to interpretation. For example, one can use an element as ‘Author’, another can use it as ‘Writer’. Here, humans can make out that both are same, but how a machine/system will decide! This creates confusion when machines try to share data with each other.
  2. The nesting of tags does not have standard meaning. It is up to each application to interpret the nesting. For example, David John is a lecturer of Thermodynamics. There are various ways of representing this sentence in XML. At least the two possibilities are:

<course name="Thermodynamics">
                               <lecturer>David John</lecturer>
</course>

<lecturer name="David John">
                              <teaches>Thermodynamics</teaches>
                         </lecturer>

The above two formalizations include essentially an opposite nesting although they represent the same information. In the first case, course name is considered as the primary one that nested the element lecturer. Whereas, in the second case, lecturer is treated as primary element and the nested element is teaches referring the course name. So there is no standard way of assigning meaning to tag nesting.
  1. Domain-Specific Markup Languages: Since the user is at freedom to define his/her own tags, many domain-specific markup languages have been developed, for example, MathML [10], CML (Chemical Markup Language) [11]. The problem with various domain-specific markup language is that of non-standardization, while describing the resources on the Web. But at the same time preventing this kind of flexibility and extensibility will again result in lack of inadequate resource description. Hence, there should be a common model/framework that can bridge the gap between these various schemas. It is at this stage that the RDF came into the picture, which is also the next layer in the Semantic Web pyramid of Figure 1.

3.1.2  Resource Description Framework (RDF)
RDF is a basic data model, not a language. The RDF model provides the description of Web documents (in other words rendering of metadata to the documents) in a natural manner so that the metadata can be shared across different applications. RDF expresses the meaning, encoded in sets of triplets (resource/subject, predicate/property and object/value), each triplet being rather like the subject, verb and object of an elementary sentence. These triplets can be written using XML tags.

RDF Triplets

A simple RDF model has three parts [12]:  
  1. Subject/Resource: Any entity, which has to be described, is known as resource, also known as subject. For instance, it can be a webpage in Internet or a person in a society.  
  2. Predicate/Property: Any characteristic of a resource or its attribute, which is used for the description of the same, is known as property or predicate. For example, a webpage can be recognized by Title or a man can be recognized by his Name. Here, both are the attributes for recognition of the resources Webpage and person. 
  3. Object/Value: A value of a property is termed as object. For example, the title of DRTC Webpage isDocumentation Research and Training Centrename of a Person is S. R. Ranganathan. Here, Documentation Research and Training Centre and S. R. Ranganathan are the values against the properties title and name respectively.

The combination of subject, predicate and object is said to be a Statement. For example, a statement, David John is the author of the webpage http://drtc.isibang.ac.in/~David. This statement can be represented diagrammatically as shown in Figure 2 [4]:
Alternate Text



The XML representation of the above statement is
<?xml version="1.0"? Encoding=“UTF-16”
 <rdf:RDF
    xmlns : rdf ="http://www.w3.org/1999//02/22-rdf- syntax-ns#"
    xmlns : mydomain="http://mydoamin.org/schema/">

     <rdf:Description rdf:about="http://drtc.isibang.ac.in/~David">
<mydomain:author>David John</mydomain:author>
     </rdf:Description>
 </rdf :RDF>



The first line specifies that we are using XML version 1.0. xmlns:rdf =“http://www.w3.org/1999//02/22-rdf- syntax-ns#” specifies the XML namespace. An XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute name. The syntax of declaring an XML namespace is: xmlns:namespace-prefix=“namespace”. The rdf:Description element makes a statement about the resource http://drtc.isibang.ac.in/~David. Within the description the property is used as a tag, and the content is the value of the property.

The most important feature of RDF is that it is developed to be domain-independent. It is very general in nature and does not restrict/apply any constraint on any one particular domain. It can be used to describe information about any domain. The RDF model imitates the class system of object-oriented programming. A collection of classes (as defined for a specific purpose or domain) is called a schema in RDF. These classes are extensible through subclass refinement [12]. Thus, various related schemas can be made using the base schema. RDF also supports metadata reuse by allowing transmission or sharing between various schemas.

3.1.2.1  RDF vs. RDF Schema
An illustration of different layers involved in RDF and RDFS [12] can be represented in the following way for a statement: Networking is taught by David John. The schema for this statement may contain classes such aslecturers, academic staff members, staff members, courses and properties such as is taught by, involves, etc. The above statement can be illustrated as follows. In the following Figure 3, rectangles are properties, ellipses above the dashed line are classes, and ellipses below the dashed line are instances.
 Alternate Text

Figure 3: RDF and RDFS layers


3.1.2.2  Issues with RDF Schema
RDF and RDFS allow the representation of some ontological knowledge. The main modeling primitives of RDF/RDFS concern the organization of vocabularies in typed hierarchies: subclass and subproperty relationships, domain and range restrictions, and instances of classes. However, a number of other features that are missing as referred in [18] are  
  1. Local scope of properties: rdfs:range defines the range of a property for all the classes. Hence, in RDF Schema we cannot declare range restrictions that apply only to some classes, and not all. For example, we cannot say that Cows eat only Plants, while other Animals may eat Meat, too.  
  2. Disjointness of classes: Sometimes we wish to say that classes are disjoint. For example, Male and Female are disjoint. But in RDF Schema we cannot do this.  
  3. Boolean combinations of classes: Sometimes we wish to build new classes by combining other classes using union, intersection, and complement. For example, we may wish to define the class Person to be the disjoint union of the classes Male and Female. RDFS does not allow such descriptions.  
  4. Cardinality restrictions: Sometimes we wish to place restrictions on how many distinct values a property may or must take. For example, we would like to say that a Person has exactly two Parents, or that a Course is taught by at least one Lecturer. Again, such restrictions are not possible to express in RDFS.  
  5. Special characteristics of properties: Sometimes it is useful to say that a property is transitive, unique, or the inverse of another property (e.g., eats and is eaten by).

Thus we need an ontology language that is richer than RDF Schema, a language that offers the above features and more. In designing such a language one should be aware of the trade-off between expressive power and efficient reasoning support. Generally speaking, the richer the language is, the more inefficient the reasoning support becomes, often crossing the border of non-computability. Thus we need a compromise, a language that can be supported by reasonably efficient reasoners, while being sufficiently expressive to express large classes of ontologies and knowledge.

3.1.3  Ontology
The concept originated more than two thousand years ago from philosophy and more specifically from Aristotle’s theory of categories [19]. The original purpose was to provide a categorization of all existing things in the world. Ontologies have been lately adopted in several other fields, such as Library and Information Science (LIS), Artificial Intelligence (AI), and more recently in Computer Science (CS), as the main means for describing how classes of objects are correlated, or for categorizing the document resources [46]. Many definitions of ontologies have been provided. According to Gruber, ontology is defined as, “an explicit specification of a conceptualization” [21].  Later on Studer et al [22] extended the definition and defined ontology as "a formal, explicit specification of a shared conceptualisation". Studer’s definition includes the idea of shared in the notion of conceptualization and formal relations among the concepts. The explicit, formal representation of a shared conceptualization involves a perspective of a specific reality, and is constituted in the conceptual structure of a knowledge base.

The ultimate objective of ontology is to share the knowledge it represents [1]. An ontology defines the terms and their formal relations within a given knowledge area. The main features of ontology are [3]

  1. Ontology provide a shared understanding of domains;
  2. Ontology is useful to represent and to facilitate the sharing of domain knowledge between human and automatic agents;
  3. Ontology is useful for the organization and navigation of websites;
  4. Ontology is useful for improving the accuracy of Web searches. Web searches can exploit the generalization and/ or specialization of information.

3.1.4  Logic and Ontology Language

In representing knowledge, logic plays an important role. Logics enhance the ontology language further.  It helps to establish the consistency and correctness of data sets and to infer conclusions that are not explicitly stated but are required by or consistent with a known set of data. We list here some of the important features of logics as follow [3, 8]:

  1. Language: logic provides a high-level language in which knowledge can be expressed in a transparent way and will have a high expressive power.  
  2. Formal semantics: it has a well-understood formal semantics, which assigns an unambiguous meaning to logical statements.  
  3. Reasoning: automated reasoners can deduce (i.e., infer) conclusions from the given knowledge, thus making implicit knowledge explicit. For example, 
  1.                                                               i.      X is a Cat
  2.                                                             ii.      a Cat is a Mammal
  3.                                                           iii.      a Mammal gives birth to young ones

Therefore, X gives birth to young ones 
  1. Inferred knowledge explanation: with the proof systems, it is possible to trace the proof that leads to a logical consequence. In this sense, the logic can provide explanations for answers.

However, addition of logic to the Web needs care as the Web with several characteristics, can lead us to the problems, while we use the existing logics [8]. Addition of logic to the Web pre-supposes use rules to make inference, necessary courses of action, etc. It is important that the logic deployed must be powerful enough in describing the complex objects, but at the same time it must not be so complex and inflexible that it becomes contradictory for the software agents itself while inferring knowledge.

There are number of different knowledge representation paradigms that have emerged to provide languages for representing ontologies, in particular description logics (discussed below) and frame logics. Web Ontology Language (OWL) is one such language that is based upon Description Logics (DL). The other such languages belonging to the family of description logics are such as, Knowledge Interchange Format (KIF) [24], Simple Common Logic (SCL) [13] etc.

3.1.5  Description Logics
Description Logics (DL) are closely related to First Order Logic (FOL) and Modal Logic (ML). Research on DL started to overcome computational problems of different complexity as the reasoning in different fragments of FOL. The research on DL started under the label terminological systems to emphasize that the representation language was used to establish the basic terminology adopted in the modeled domain [14] followed by concept languages. Now DL has become a cornerstone of Semantic Web for its use in designing ontologies.

DL became popular since the focus moved towards the properties of the underlying logical systems. Research on DL covered the theoretical foundation as well as the implementation of knowledge representation systems and the development of applications in several fields. For example, reasoning about database conceptual models; for schema representation in information integration system, or for metadata management; as logical foundation of ontology languages, etc. [14].

Description logics are formal logics with well-defined semantics. Semantics of DL is defined through model theoretic semantics, which formulate the relationships between the language syntax and the models of a domain. In designing DL, the emphasis is given on key reasoning problem decidability, and the provision of sound and complete reasoning algorithms. A key feature of DL is their ability to represent relationships beyond the is-arelationships that can hold between concepts [14].

In DL, the important notions of domain are described by concept descriptions that are built from concepts, i,e., unary predicates and, roles, i.e., binary predicates by the use of various concept and role constructors. In addition, it is also possible to state facts about the domain in the form of axioms which act as constraints on the interpretations in a DL knowledge base [15].

In DL knowledge base, the distinction between TBox (Terminological Box) and ABox (Assertional Box) is drawn which are the two main components of it. TBox contains intentional knowledge in the form of terminology and is build through declarations that describe general properties of concepts. In other words, it contains sentences describing concept hierarchies, i.e. relation between concepts. ABox contains extensional knowledge orassertional knowledge that is specific to the individuals of the domain of discourse [14].

3.1.5.1  Web Ontology Language (OWL) and its Family Members
The Web Ontology Working Group of W3C identified a number of characteristic use-cases for the semantic Web that would require much more expressiveness than RDF and RDF Schema offer. The researchers in the United States (US) and in Europe identified the need for a more powerful language to build ontology. In Europe OIL (Ontology Interface Layer), an ontology language was developed. In US, DARPA (Defense Advanced Research Project Agency) had initiated a similar project called DAML (Distributed Agent Markup Language). Latter these two have been merged and came up with a single ontology language DAML+OIL.

DAML+OIL in turn was taken as the starting point for the W3C Web Ontology Working Group in defining OWL (Web Ontology Language), the language that is aimed to be the standardized, and broadly accepted ontology language of the Semantic Web. DL is the logical foundation of OWL ontology language. OWL is built on top of RDF and RDF Schema. OWL adds more vocabulary for describing properties and classes. It also adds, relations between classes (e.g. disjointness), cardinality (e.g. exactly one), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes [16].

The intent of OWL language is to provide additional machine-processable semantics for resources that is to make the machine representations of resources more closely resemble their intended real world counterparts [17]. In order to add the following listed capabilities to ontologies, OWL uses both URIs naming and the description framework for the Web provided by RDF [16]. The added advantages are

  1. Ability to be distributed across many systems
  2. Scalability to Web needs
  3. Compatibility with Web standards for accessibility and internationalization
  4. Openness and extensibility

OWL 1.0 ontology language consists of three sub-languages, such as, OWL Full, OWL DL and OWL Lite. These sub-languages differ by their power of expressiveness as discussed below. In the following we discuss the OWL species along with their advantages and disadvantages.

OWL Full: It is the complete language and it uses all the OWL language primitives. It allows the combination of these primitives in arbitrary ways with RDF and RDF Schema. 

Advantage: It is fully upward-compatible with RDF, both syntactically and semantically. Any legal RDF document is also a legal OWL Full document, and any valid RDF/RDF Schema conclusion is also a valid OWL Full conclusion. 

Disadvantage: Due to its greater expressive power, it has become undecidable and therefore impractical for applications that require complete and efficient reasoning support. More expressive knowledge base leads to the complexity in terms of reasoning. Software agents will need more time (where time growth rate is exponential) to process a query.

OWL DL: Supports the users who want the maximum expressiveness while retaining computational completeness. All conclusions are guaranteed to be computable and is designed to regain computational decidability, i.e. all computations will finish in finite time. OWL DL includes all OWL language constructs, but they can be used only under certain restrictions (for example, while a class may be a subclass of many classes, a class cannot be an instance of another class). OWL DL corresponds to the SHOIN (D) [14] description logic, a little less expressive language.  

Advantage: It supports efficient reasoning. 

Disadvantage: We lose full compatibility with RDF. A RDF document will in general have to be extended in some ways and restricted in others before it is a legal OWL DL document. Every legal OWL DL document is a legal RDF document.

OWL Lite: OWL Lite is OWL DL with more restrictions. It corresponds to the less expressiveSHIF (D) descriptive logic. For example, OWL Lite excludes enumerated classes, disjointness statements, and arbitrary cardinality. The idea is to make it easy to start with and easy to implement processors, so that people can begin using OWL Lite easily and later graduate to more complicated uses.


Advantage: It is easier to grasp (for users) and easier to implement (for tool builders).

Disadvantage: The expressiveness is more restricted.

Table 3 shows partially an OWL DL ontology code (expressed in RDF/XML) against the ontology of African Wildlife drawn in Figure 4 [4].
 Alternate Text
Figure 4: Classes and subclasses of the African wildlife ontology
 
Table 3: OWL Ontology

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:owl ="http://www.w3.org/2002/07/owl#">
<owl:Ontology rdf:about="xml:base"/>
<owl:Class rdf:ID="Animal">
<rdfs:comment>Animals form a class.</rdfs:comment>
</owl:Class>
<owl:Class rdf:ID="Plant">
<rdfs:comment>Plants form a class disjoint from animals.</rdfs:comment>
<owl:disjointWith rdf:resource="#Animal"/>
</owl:Class>
<owl:Class rdf:ID="Tree">
<rdfs:comment>Trees are a type of plant.</rdfs:comment>
<rdfs:subClassOf rdf:resource="#Plant"/>
</owl:Class>

<owl:Class rdf:ID="Herbivore">
<rdfs:comment>Herbivores are exactly those animals that eat only plants or parts of plants.</rdfs:comment>
<owl:intersectionOf rdf:parseType="Collection">
 <owl:Class rdf:about="#Animal"/>
  <owl:Restriction>
  <owl:onProperty rdf:resource="#eats"/>
   <owl:allValuesFrom>
    <owl:Class>
     <owl:unionOf rdf:parseType="Collection">
      <owl:Class rdf:about="#Plant"/>
       <owl:Restriction>
        <owl:onProperty rdf:resource="#is_part_of"/>
         <owl:allValuesFrom rdf:resource="#Plant"/>
       </owl:Restriction>
     </owl:unionOf>
    </owl:Class>
   </owl:allValuesFrom>
 </owl:Restriction>
           </owl:intersectionOf>
        </owl:Class>
<owl:Class rdf:ID="Carnivore">
  <rdfs:comment>Carnivores are exactly those animals that eat animals.</rdfs:comment>
   <owl:intersectionOf rdf:parseType="Collection">
    <owl:Class rdf:about="#Animal"/>
     <owl:Restriction>
      <owl:onProperty rdf:resource="#eats"/>
      <owl:someValuesFrom rdf:resource="#Animal"/>
     </owl:Restriction>
    </owl:intersectionOf>
</owl:Class>
<owl:Class rdf:ID="Lion">
  <rdfs:comment>Lions are animals that eat only herbivores.</rdfs:comment>
   <rdfs:subClassOf rdf:resource="#Carnivore"/>
   <rdfs:subClassOf>
     <owl:Restriction>
      <owl:onProperty rdf:resource="#eats"/>
    <owl:allValuesFrom rdf:resource="#Herbivore"/>
     </owl:Restriction>
   </rdfs:subClassOf>
</owl:Class>
……………………
……………………
</rdf:RDF>



3.1.6  Trust Layer
At the top of the pyramid it is the trust layer. It is a high-level and crucial concept. The Web will achieve its full potential only when users have trust in its operation (security) and in the quality of information provided. The trust layer can emerge through the use of digital signatures and other kinds of information, for instance, rating, recommendations made by trusted agents, certification agencies and/or customer bodies [18].

In summary, each layer in the Semantic Web layer cake is built on the layer below. Each layer is progressively more specialized and also tends to be more complex than the layers below it. The layers can be developed and made operational relatively independently. 

4. Semantic Web and Natural language Processing

In this section we explore the possibility of using NLP in Semantic Web.

As it is stated in [25] it is entirely appropriate, indeed highly desirable, to apply NLP methods to the foundations of the Semantic Web. The dream of Semantic Web soon will become true if really this happens.
Dini [26] stated that NLP can help Semantic Web in two phases: in the acquisition phase (i.e., at the time of building Semantic Web), and in the retrieval phase (i.e., at the time of accessing Semantic Web). Here, the phrase at the time of building Semantic Web refers to the fact that to build Semantic Web we need very accurate tagging algorithm. The phrase at the retrieval phase refers to the fact that to query Semantic Web, NLP could help transforming semantic resources with simple but smart search interfaces.

A number of recently appeared papers focus on the possibility of automatically tagging Web pages with RDF descriptions. Tagging has always been one of the most popular tasks in NLP experiments, and it is obviously tempting to assume that the final result of a completely tagged Web could be achieved only by applying tagging algorithms [26]. Furthermore, as the automatic classification has nowadays reached a satisfying degree of accuracy, this might be a precious help in extracting RDF descriptions, but it is definitely not enough. In the Semantic Web perspective it is not sufficient to say that a certain web page is about an institution. There is the need to qualify the resource (e.g., Organization) described in that page: the year of establishmentcourses offersplace where the institute is located, etc. In order to do that a tagging application should also be able to gather missing information from different websites and create links with different resources [26].

In summary, some of the applications of NLP in Semantic Web are: can be applied to build knowledge bases, can be applied to construct ontology, and can be used in ontology learning. Note that the research in exploring the use of natural language processing in Semantic Web is at the premature stage. Currently, lots of research is going on in this area. 

5. Conclusion

In this module we discussed various semantic techniques and technologies, such as, RDF, RDFS, OWL, ontology, logic, etc. The semantic techniques and technologies, as discussed, are essential for organizing, representing and retrieving the information meaningfully. The semantic representation of information allows us to infer new knowledge from the existing knowledge in the knowledge base. In this module we also discussed some of the applications of natural language processing in Semantic Web. 

6. References

  1. Dutta, B., Chatterjee, U. and Madalli, Devika P. (2013). From Application Ontology to Core Ontology. In the Proceedings of International Conference on Knowledge Modelling and Knowledge Management (ICKM 2013), Bangalore, India. ISBN: 978-93-5137-765-8. 
  2. Semantic Web Made Easy. http://www.w3.org/RDF/Metalog/docs/sw-easy
  3. Antoniou, Grigoris and Harmelen, Frank van. A semantic web primer. London: MIT Press, 2004.
  4. Dutta, B. and Prasad, A. R. D. Semantic e-learning system: theory, implementation and applications. Germany: LAP, 2013, pp. 216, ISBN 978-3-659-18318-8.
  5. Berners-Lee, T., Hendler, J. and Lassila, O. (2001). The Semantic Web: a new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific Americanhttp://www.scientificamerican.com/article.cfm?id=the-semantic-web
  6. Dutta, B. (2006). Semantic Web Technology: Towards Meaningful Retrieval. SRELS Journal of Information Management, 43 (2), pp. 149-154.
  7. Dutta, B. (2008). Semantic Web Services: A Study of Existing Technologies, Tools and Projects. DESIDOC Journal of Library and Information Technology, 28 (3), pp. 47-55.
  8. Berners-Lee, T., Connolly, D., Kagal, L., Scharf, Y. and Hendler, J. (2006). N3Logic: a logical framework for the World Wide Web.http://www.dig.csail.mit.edu/2006/Papers/TPLP/n3logic-tplp.pdf
  9. Davis, J., Fensel, D. and Harmelen, Frank van. Towards the semantic web. West Sussex: John Wiley, 2003.
  10. MathML. http://www.w3.org/Math/ 
  11. Chemical Markup Language (CML). http://cml.sourceforge.net/
  12. Resource Description Framework (RDF) Model and Syntax Specification: W3C Recommendation, 22 Feb. 1999. http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/#intro
  13. Altheim, M., Anderson, B., Hayes, P., Menzel, C., Sowa, J. F., and Tammet, T. SCL: Simple Common Logic. http://www.ihmc.us/users/phayes/CL/SCL2004.html  
  14. Description Logic Handbook: Theory, Implementation and Applications. Ed. by F. Baader, D. Calvanese, D.L. McGuinness, D. Nardi, P.F. Patel-Schneider. Cambridge University Press, 2003.
  15. Agarwal, S. (2007). Formal Description of Web Services for Expressive Matchmaking. Doctoral thesis. http://www.digbib.ubka.uni-karlsruhe.de/volltexte/documents/2531.
  16. Web Ontology Language. http://www.w3.org/2004/OWL/
  17. RDF primer, 2004. http://www.w3.org/TR/REC-rdf-syntax/#richerschemas
  18. Lassila, O. Towards the semantic web.http://www.w3c.rl.ac.uk/pastevents/TowardsTheSemanticWeb.pdf 
  19. Aristotle's Categories, 2007. http://plato.stanford.edu/entries/aristotle-categories/
  20. Giunchiglia, F., Dutta, B. and Maltese, V. (2009). Faceted lightweight ontologies. Conceptual Modeling: Foundations and Applications, Alex Borgida, Vinay Chaudhri, Paolo Giorgini and Eric Yu (Eds.), LNCS 5600 Springer.
  21. Gruber, T. R. (1993). A translation approach to portable ontology specifications.Knowledge Acquisition, 5(2), pp.199–220].
  22. Studer, R., Benjamins, V. R. and Fensel, D. (1998). Knowledge engineering: principles and methods. http://www.das.ufsc.br/~gb/pg-ia/KnowledgeEngineering-PrinciplesAndMethods.pdf
  23. Wilks, Yorick  and Brewster, Christopher (2009). Natural Language Processing as a Foundation of the Semantic Web. Foundations and Trends in Web Science, 1(3–4), 199‐327. doi: http://dx.doi.org/10.1561/1800000002
  24. Garbham, A. Artificial intelligence: an introduction. London: Routledge & Kegan Paul, 1988.
  25. Wilks, Yorick and Brewster, Christopher (2009). Natural Language Processing as a Foundation of the Semantic Web. Foundations and Trends in Web Science, 1(3–4), 199‐327. doi: http://dx.doi.org/10.1561/1800000002
  26. Dini, Luca (2004). NLP technologies and the semantic web: risks, opportunities and challenges. Intelligenza Artificiale 1(1), pp. 67-71. 

No comments: