Wednesday, November 27, 2013

11. Science Indicators P- 07. Informetrics & Scientometrics

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com


11. Science Indicators


P- 07. Informetrics & Scientometrics *

By :I K Ravichandra Rao,Paper Coordinator

11. Science Indicators


P- 07. Informetrics & Scientometrics *


1. MORE LEARNING IN AUDIO FORMS

http://epgp.inflibnet.ac.in/vt/ls/infsci/science_indicators/Science%20Indicators_SL.mp4
http://epgp.inflibnet.ac.in/vt/ls/infsci/science_indicators/Science%20Indicators_etext.mp4

 Content Navigation (e-Text)
Browse through the whole course site from one location.

Objectives


  • Understand the meaning of indicator
  • Understand the role of Science and Technology indicators
  • Conceptual understanding of  Scientometrics/ bibliometrics
  • Illustrate some key indicators derived from bibliometrics
And show their applications.

Summary

The purpose of the indicators is discussed. The factors to be considered for construction of indicators are discussed. the three basic laws of bibliometrics are discussed in the context of indicators. 

Introduction

Science Policy cannot be made in isolation. It must centre on S&T Trends and their relation to current public policy developments.  S&T policy must be based on evidence – Authentic data collected and analyzed on a regular basis.
For Policy Input, it is important to derive the salient and important trends and indications from R&D statistics. In spite of some major inherent limitations, the approach of analyzing scientific activity through S&T indicators has come to stay. The debate has now shifted to how well one can construct better indicators and collects statistics: what would be the proxies to measure the various parameters, and how to create composite indicators.
Indicators should help to describe the science and technology system, enabling better understanding of its structure, of the impact of policies and programs on it, and of the impact of science and technology on society and the economy

It should be kept in mind that every indicator sends a signal. It conveys information about a particular element or a sub-element that it represents, but not about the system as a whole. An ideal indicator should be representative—it should cover the most important aspects of the elements concerned. It should be reliable — in that it should directly reflect how far the objective concerned is met, well founded, accurate, measured in a standardized way; and feasible—data should be readily available, and at reasonable cost.
Indicators are different from statistics:
  •  they measure the dimensions of a phenomenon;   they are based on statistics that can be recurrent;   they usually appear as a collection of statistics.
Price (1978) formulated the same requirement in the following way:
To be meaningful, a statistic must be somehow anticipatable from its internal structure or its relation to data. It means the establishment of a set of simple and fundamental laws.”


Construction of Indicators: Key Considerations

Basic Steps:
Concept ─ Subdivide the concept according to several set of dimensions
Create indicators for measuring each of the dimensions
Creation of composite indicators
Indicator construction are based on Direct measures, Indirect measures 
There are two issues of primary importanceReliability (enabling different people to use them and get consistent results) and Validity- based on identifiable criteria that measure what they are intended to measure
                Common Indicators of R&D & Innovation: Strengths and Weaknesses
MEASURE
STREGTHS
WEAKNESES
R&D Activities
Regular & Recognised data on main source of technology
Lacks detail
Underestimates small firms, design, prodn eng.
Research Papers
 Good proxy to assess scientific research
Tacit & strategic knowledge not captured
Patents
Standards
Regular detailed & long term data
Adoption indication
Uneven propensity to patent
Uneven propensity,
Long complex documents
Significant Innovations
Direct measure of output
 Measure of significance
Cost of collection
Misses  Incremental changes
 Innovation Surveys
 Direct measure of output
Comprehensive coverage
Variable definition of  innovation
Cost
 Expert Judgments
 Direct use of expertise
 Finding independent expertise
 Judgements beyond expertise
 Product Announcements
 Close to commercialization
  Misses In-house process innovations
 Misses incremental product improvements

What is Scientometrics ?


The quantitative approach to characterize scientific activity emerged as a new strand of research within science and technology studies in 1960’s. Within this quantitative approach which was coined ‘Scientometics’, a research community became very active who were largely concerned with measuring the communication process of science. This research activity is called ‘bibliometrics’ and largely overlaps with scientometrics.
The term Scientometrics originated as a Russian term for the application of quantitative methods to the history of science. The scope and objectives have widened considerably.
Scientometrics is a wide-ranging field with vague boundaries. It is a generic term for a system of knowledge which endeavours to study the scientific (and technological) system, using a variety of approaches within the area of Science and Technology Studies (STS).
Scientometrics has followed the trajectory of econometrics in the use of quantitative data, concepts and models and extensive use of mathematical and statistical technique of modelling and data analysis.     
            This vision also implies the use of Scientometrics for decision-making in science policy, just like the use of econometrics for decision–making in economic policy.
Bibliometric, especially evaluative bibliometrics, uses counts of publications, patents, citations and other potentially informative items to develop science and technology performance indicators.
There are implicit assumptions/propositions that underlay the utilisations and validity of bibliometric analysis.
  • One of them is Activity measurement that proposes that counts of patents and papers provide valid indicators of R&D activity in the subject areas of those patents and papers, and at the institutions from which they originate.
  • The second important proposition is Impact measurement, in which it is proposed that the number of times those patents and papers are cited in subsequent patents or papers provides valid indicators of the impact or importance of the cited patents and papers.
  • The Third Important Proposition is Linkage Measurement.
In this it is proposed that citations from papers to papers, from patents to patents, and from patents to papers, provide indicators of intellectual linkages
             Among the organisations that are producing the patents and papers,  Knowledge linkages among their subject areas. 

The application of Bibliometric Analysis can be under four levels:

(a) Evaluation of National or Regional technical performance (policy level);
(b)   Evaluation of Scientific Performance of universities or technological performance of company (strategic level);
(c) Tracing and Tracking R&D Activity in specific scientific and technological areas or problems (tactic level); science-technology linkage, etc. and
(d) Identifying specific activities and specific people engaged in R&D (conventional level).
Elements, units and levels of Aggregation in Bibliometrics  
Elements of Bibliometric Analysis are publications and co-authors; units are specific aggregates such as journals, subject categories, and institutions and countries to which papers can be assigned. References (citations) are specific elementary links between papers.
            When dealing with patents, inventors and assignees are relevant elements
The distinction between three levels of aggregation is important. Each level of aggregation requires its own methodological and technological approach.
Micro Level: Research output of individuals and research groups; Meso Level: Research output of institutions and scientific journals; Macro level: Research output of regions and countries
Scientometric Techniques
In terms of methodology, Scientometric Technique can be classified into two categories:
       One-Dimensional (or scalar) and
       Two-Dimensional  (or relational technique).            One-dimensional  techniques are based on direct counts (or occurrences) and graphical representation of specific bibliometric entities (e.g., publications and patents) or particular data –elements in these items, such as citations, keywords or addresses.       One-Dimensional Techniques are used to generate scalar indicators for monitoring the state-of-the-S&T system. Scalar indicators are increasingly being exploited for science policy purposes-both as descriptive and diagnostic tools. Two-Dimensional Techniques are based on co-occurrences of specific data-elements, such as number of times the keywords, classification codes, citations and addresses are mentioned together.
Laws in Bibliometrics
Law
Definition
Example
Lotka's law of scientific productivity
Law describes the frequency of publication by authors in a given field. “Basically this means that in a given field a very large percentage of authors produce only one paper, fewer authors produce two papers, and so forth.  Only a small number of authors produce a substantial number of publications”.
Egghe and Rao in an article in the August 2002 issue of JASIST worked on applying Lotka's law in cases where there are multiple authors of a single journal article.  They referred to their analysis as fractional frequency distributions.
Bradford's law of scatter
Serves as a general guideline to librarians in determining the number of core journals in any given field.  "The references are scattered throughout all periodicals with a frequency approximately related inversely to the scope.”

Zipf's law of word occurrence
To predict the frequency of words within a text. For a given text the rank of a word multiplied by the frequency is a constant. Works well for high frequency words, not so well for low – thus a number of modifications.


Science and Technology performance indicators
Scientific Papers 
Listed Indicators which shows quantitative changes: Measure the productivity
Indicator
Further Description
Advantage
Disadvantage
Example
Numbers of papers


Easy to retrieve 
Does not say anything
about the impact 
Ex. 1
Share of the number of papers




Ex. 1 
Comparison of research output over the years 
International comparison of countries by “the degree of contribution to the production of papers in the world”

Evolution of research output in different years

Ex. 2
Activity in different fields




Ex. 3 
Co-authoring[1] 
International collaboration/ National collaboration/ Department collaboration 
Shows to what extent an analyzed unit
cooperates with other units in the production of papers 

Ex. 4 

Some examples to qualify above mentioned indicators
Alternate Text
Alternate Text


Research Areas
2000-11
Papers
Share
Engineering
2424670
24.3
Chemistry
1621156
16.2
Physics
1604621
16.1
Computer Science
1274468
12.8
Materials Science
973841
9.7
Biochemistry Molecular Biology
916902
9.2


Example 4: Multi-authorship pattern of Indian publication activity in nanotechnology

Year
Single Author(Share of Publications)
Two Authors(Share of Publications)
Multi Authors(Share of Publications)
2000
13(5.28)
60(24.39)
173(70.33)
2005
51(4.55)
225(20.05)
846(75.40)
2009
103(2.98)
718(20.78)
2634(76.24)




2009
103(2.98)
718(20.78)
2634(76.24)

Description: Method – Count the number of articles published by the analyzed unit during the analyzed time spam and check how many that was co-authored together with a selected other unit. Divide the second figure by the first one to get the share of articles co authored between the units.

Where, px = share of publications co-authored with a certain unit
Px = number of publications co-authored with the selected unit
P = total number of publications produced at the analyzed unit during the analyzed time.


[1] For detailed description refer Example 4

Listed Indicators which show qualitative changes: Measure scientific impact

Indicator
Further Description
Advantage
Disadvantage
Example
Number of citations


Gives an indication of the scientific impact
Does not take into account that older articles usually are more cited and that
citation rates vary between document types and subject areas
Ex. 5
Citations per publication (CPP)

Gives an indication of the average scientific impact
Citation rates vary between
document types and subject areas
Ex. 5
Citations received in the year of publication
How fast paper made impact on international community


Ex. 5
Uncited papers
The number of papers which did not received citation even once during the time period considered


Ex. 5
Highly cited papers[1]
Number of papers that received maximum citations during the research period
Requires data from a comprehensive citation database
High normalized
citation score can be due to few highly-cited articles---this is not considered
Ex. 6
Journal Impact Factor (IF)[2]
Used to measure the impact of scientific journals where paper is published


Ex. 7
Number of papers in top ranked journals
Select journals according to a suitable criterion like Impact factor of the journal
Does reflect the potential impact of paper
Does not take the size of the analyzed time duration into account
Ex. 8
Some examples to qualify the above mentioned indicators are:
Example 5: Publications from India: Nanotechnology Scenario

Year

Publications


Citations

Citation per paper
(in the year of publication)
Citations received in the year of publication
(Uncited papers in the year of publication; %Uncited)
Uncited papers
(%uncited)*
2005
1072
15985
14.9 (0.3)
295 [777; 72%]
127 (12%)
2009
3086
14559
  4.7 (0.4)
1364 [1869;61%]
762 (25%)
2011
5020
   5260
      1.0 (0.4)
2241 [3806;76% ]
2674 (53%)

[1] For details refer Example 6
[2] For further details refer Example 7

Example 6: Trends in Highly Cited Papers (2011)
Country
Total Papers (rank)
Top 1% highly cited papers (rank)
USA
455541 (1)
9308 (1)
Japan
98890 (5)
1098 (9)
Germany
118598 (3)
2626 (2)
UK
102754 (4)
2551 (3)
France
82293 (6)
1555 (5)
China
235639 (2)
1943 (4)
India
55389 (10)
319 (20)
S. Korea
53601 (11)
533 (15)

Note: In this example the top 1% highly cited papers in year 2011 globally are taken and the presence of different countries is shown by number of papers and their rank relatively.
Example 7: Journal Impact Factor
The 2005 impact factor of the journal Nature is produced by counting the number of citeable publications in Nature during 2005 that cite publications in nature from 2003-2004 and dividing this with the total number of publications in Nature 2003-2004.
Description:

where:
I = the impact factor for journal J in year Y
C = the number of citations from publications in year Y to publications in journal J published Y-2 and Y-1
P = total number of citeable publications in journal J in year Y-2 and Y-1
Example 8: Publication activity in high IF journals
High Impact Journals (IF)
USA (Rank)
Germany (Rank)
France (Rank)
China (Rank)
S. Korea (Rank)
India (Rank)
Cancer journal for clinicians (101.78)
61.43 (1)
11.98 (3)
4.73 (5)
1.01 (18)
0.50 (20)
0.30 (24)
Annual Reviews of immunology (52.761)
67.19 (1)
3.56 (3)
3.29 (5)
0.00 (18)
0.26 (20)
0.00
Reviews of modern physics (43.933)
57.50 (1)
16.13 (2)
8.17 (3)
0.62 (27)
0.41 (31)
1.34 (18)
Chemical Reviews (40.197)
48.63 (1)
9.90 (2)
6.72 (3)
2.74 (9)
0.79 (20)
2.33 (10)

Identifying Conceptual Connections among documents
Helps to identify papers that address key themes/concepts. Further advanced techniques such as co-citation analysis helps to identify the research front. Co-word analysis helps to show connections among concepts.
Common approach: common keywords among documents
More sophisticated approaches 
Bibliographic Coupling and Co-citation analysis

Similarity through Matching Reference (Bibliographic Coupling)
A reference in an article reflects one or more concepts upon which the article draws. Two articles that share a common reference (bibliographic coupling) would therefore have some linkage through the shared concept(s), even though the articles themselves might have vastly different terminology. So, searching for linkages among two or more articles through shared references offers a way to identify linking mechanisms
Similarity through indentifying jointly cited papers (Co-Citation)
Co-citation analysis involves tracking pairs of papers that are cited together in the source articles. When the same pairs of papers are co-cited with other papers by many authors, clusters of research begin to form. The co-cited or “core” papers in these clusters tend to share some common theme, theoretical or methodological or both.  
Method:
references in a document are identified
relatedness between these references is calculated (how many times two references occurred in the same document)
the references are clustered using a transform of the co-occurrence matrix
finally, the original documents are assigned to these reference clusters
Co word analysis
Co-word analysis is a content analysis technique that uses patterns of co-occurrence of pairs of items (i.e., words or noun phrases) in texts to identify the relationships between ideas within the subject areas presented in the texts.
It is used to identify the relationships between ideas within the subject areas presented in the texts and the strength of relationships between items.
Co-word analysis is also very much similar to co-citation analysis. The only difference is that co-word analysis focuses on words in the document rather than references.
Method: The words or phrases that are important are identified and the relatedness between words is calculated (based on co-occurrence). Finally, the words are clustered and documents are assigned to these word clusters.
What all can be done from Publication analysis: Summary Table
Variables
Different Indicators which can be constructed
Authors
Number in a subject, field, institution, country; growth; correlation with productivity; collaboration - co-authorship, associated networks; author in a subject  
Origin
Rates of production, size, growth by country, institution, language, subject; Correlation with economic & other indicators 
Sources
Journals: Growth, dynamics, numbers; life cycles; quantity/yield distribution; Various distributions by subject, language, country 
Contents
Analysis of texts -- distribution of words, phrases in various parts; subject analysis, co-word analysis
Citations
Citation indexes, impact factors, co-citation studies etc; Some other analysis - number of references in articles, number of citations to articles, bibliographic coupling; co-citations - author connections, subject structure, networks, maps etc; papers validation with qualitative methods and impact
Note: Adopted from Tefko Saracevic study (from Rutgers University)

Methodological Problems of bibliometric based Indicator
Many of the problems in construction of bibliometric indicators can be addressed if one has understanding of principles behind construction of indicators.  This applies to S&T indicators including bibliometric indicators.
Most of S&T indicators often have little relationship with What they Attempt to Measure?, How those measurements might be carried out and used in Policy Design?, How the Policy Instruments that they create would influence the working of the economic system.
There are limitations which primarily apply to bibliometric based inbdicators. In the context of publication based indicators following limitations are primarily visible: Indicates quantity of output, not quality; Non-journal methods of communication ignored; Publication practices vary across fields, journals, employing institutions;  Choice of suitable, inclusive database is problematical; Undesirable publishing practices (artificially inflated number of co-authors; shorter papers); Papers represent only one output of laboratory based activity.
In particular the fact that a paper is less frequently cited or (still) unquoted several years after its publication gives information about its reception by colleagues but does not reveal anything about the quality or standing of its author(s). High degree of citation may indicate that that its content has integrated into the body of knowledge of the respective subject field. Low/no citations may indicate likely that the results involved do not contribute essentially to the contemporary scientific paradigm system of the subject field in question. Similarly, major concerns in using citations were identified as:
Intellectual link between citing source and reference article may not always exist; Incorrect work can be highly cited; Methodological papers among most highly cited; Citations lost in automated searches due to spelling differences and inconsistencies; Similar to publication practices, citations vary across fields, journals, employing institutions; SCI and Scopus source in which citations are available changes over time; SCI and Scopus is biased over English language journals; Works of great importance rapidly become part of a common knowledge and are thus referred to in the literature without citation.
Citations may be critical rather than positive : however it has been argued that even contested results make a contribution to knowledge; The various scientific fields are cultivated by groups of varying size, and thus the probability of being cited varies from sector to sector; The number of citations does not follow a linear rate in the course of time; The value of scientific work is not always acknowledged by contemporaries:
It is important to understand limitations of indicators based on publication and citation count.    There is a tendency to make claims that are questionable
Bibliometrics is continuously improving: Normalization / comparability of indicators; Self-citations, fractional counting; Data standardization; Coverage of the different outputs; Monitoring deficiencies & manipulation. 


No comments: