Tuesday, November 26, 2013

08. Growth of Literature P- 07. Informetrics & Scientometrics

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com

08. Growth of Literature


P- 07. Informetrics & Scientometrics *

By :I K Ravichandra Rao,Paper Coordinator




08. Growth of Literature   



Objectives


The objectives of the Unit are to discuss:
  • an overview of growth of literature
  • computational aspects of doubling time period and growth rates
  • different models -- exponential, logistic, power, Gompertz, etc.
  • on selection of a trend type
  • selected articles on growth of literature
  • relation between growth of source and items, growth rates and obsolescence rates, etc.


Summary


Price in 1961 argued scientific literature  grows exponentially and computed the growth rate as 5% over the past 5% over the past two centuries. Since then many have worked in this area and come out different models to explain the phenomena of growth of literature. Also, it has been found that growth rates of literature are influenced by many factors; these are discussed in this Unit. For example, growth rates of sources are influenced by the growth rates of items and vice versa!

Introduction

The numbers of scientific journals including the abstracting periodicals are simple indicators of scientific growth. Price in 1961 argued scientific literature grows exponentially and computed the growth rate as 5% over the past two centuries. He further observed that literature doubles approximately once in fifteen years. Neelameghan (1963) analyzed the documents on the history of medicine in India for the period 1954-61, during which period Indian contribution was 65% and foreign was 30%. He studied the growth of Indian medical societies and medical periodicals between 1780 and 1920. He also studied the coverage of Indian medical literature in Index Medicus and Experta Medica and it was found that they covered respectively only 38% and 13.5% of the Indian literature. Since then a number of articles were published on this topic, particularly the growth of literature in different subjects and on various growth models. The number and the growth characteristics (of articles, journals, scientists, discoveries, etc.) have been matters of some debate for considerable time. For instance, Price (1963) argued that:

  • Once in fifty years the number of universities, labor force, population, etc. double
  • Once in twenty years GNP, discoveries, scientists, college entrants/1000 population double
  • Once in fifteen years the number of scientific journals doubles
  • Once in ten years the number of articles / literature in a field (particularly in science) doubles.
  • Once in two years the number of web sites doubles

            As observed by many, the growth of publications passes through the following four stages ( Price (1963), Michael Mabe (2003) and many others):
  1. The preliminary period of growth in which the absolute increments are small although the rate of increase is large
  1. The period of exponential growth when the number of publications in a field double at regular intervals as a result of a high rate of growth
  2. The period when the rate of growth declines but the annual increments still remain approximately constant and
  3. The final period when both the rate of increment and the absolute increase decline and eventually approach to zero.
            The growth curves which explain the above four steps are also referred to as S-shape curve. A typical S-shape curve is given in Figure 1.


Alternate Text
            Figure 1. A Typical S-shape Curve.

The growth generally adopts an S-shaped pattern and is symmetrical about its point of infection. Price in 1963 estimated that the number of scholarly periodical titles being published at the end of the twentieth century would exceed one million. Meadows (1967) on the other hand observed that estimates of the number of journals varied from 10,000 in 1951 to 70,000 in 1987. Price argued that the logistic growth of knowledge over a period of time is a result of a number of applications of intellectual innovations. He has fitted the logistic curve to the cumulative number of new publications appearing every year in science. He has also studied the growth of literature covered by Physics Abstracts during 1900-1950. He observed that except for the interruptions during the two world wars the literature has been increasing at exponential rates with a doubling time of about twelve years. Michael Mabe in his study also observed that journal growth rates have been remarkably consistent over time with average rates of 3.46%, since 1800. He has in fact observed:
  • From 1900 to 1940, the number of active journal titles grew at an actual rate of 3.23%, a doubling time of twenty-two years.
  • From 1945 to 1976, the number of journals grew at an annual rate of 4.35%, representing a doubling time of sixteen years.
  • Since 1977, the number of journals grew at 3.26%. growth rates were very high; this trend continued until mid 1970s. Mabe pointed out that the slow growth rates after the mid 1970s were due to
  1. The oil crisis of the 1970s
  2. The increasing public awareness of potential ecological disaster and
  3. The turning away from nuclear technology in the 1950s.
These factors, certainly lead to a slow down of government support for research. With the following three assumptions
  1. The number of journals in a given subject is growing exponentially in time,
  2. Concurrently each journal is also augmenting the number of papers on the subject exponentially in time,
  3. The rate of growth of articles in individual journal is the same for all journals,
Yamazaki (1987) studied the number and rate of growth of scientific journals. His review critically assesses studies based on the 'ecological approach' to journal publishing growth. He concluded that the annual rate of growth of scientific journals is 1.85% from the end of the Eighteenth century. Naranan (1970, 1971) has shown that ‘a frequency distribution (J(p)) of the number of journals with p articles is of the form
                        J (p) µ p-a
With a » 2, this model reproduces the salient features of Bradford’s law. The mathematical concepts of growth have become popular through the largely circulated report of the Club of Rome (Donella and Others (1972)).

The Exponential Model

Page Contents 

If the growth rate is relative to the size of the population, then it is generally referred to as
relative growth rate is. It is also called the exponential growth rate, or the continuous
growth rate. An exponential model is associated with the name of Thomas Robert
Malthus (1766-1834) who first realized that any species could potentially increase in
numbers according to a geometric series. Exponential growth represents an increase with
a fixed proportion of total population for each unit of time. For example, if a species has
non-overlapping populations (e.g., annual plants), and each organism produces “b”
offspring, then, population numbers “a” in generations t=0,1,2, is equal to:

Yt = a.bt
In this case, the growth rate is (b-1) * 100. This model is also popularly known as log-linear model, and often expressed it as:
ln y = ln α + b x,
Here b is the slope of curve, and measures proportional changes in y for a given absolute change in x. The model not only provides the rate of growth (the exponential parameter), but also the rate at which the size of the population doubles. The exponential growth has also been linked and compared with the size of compound interest.  The exponential function assumes a convex shape in its graphical presentation. In exponential growth, the increase is proportional to population size, i.e. if the population is y at time t then
,
where b is the Malthusian parameter of the population. In terms of differential equations,
if Y is the population, and dy/dx  its growth rate, then its relative growth rate is  If the relative growth rate is constant,  it is not difficult to verify that the solution to this equation is P(x) = exp(βx). When calculating or discussing relative
growth rate, it is important to pay attention to the units of time being considered. 


Alternate Text
Figure 2. Total # of scientific journals and abstract journals, as a               function of date (source: Little Science Big Science)

 

The exponential function can be applied to both growth process as well as decay process. The examples of growth process are given below (Croxton and Cowden (1966)):
  • Information production process
  • Microbiology (growth of bacteria)
  • Conservation biology (restoration of disturbed populations)
  • Insect rearing (prediction of yield)
  • Plant or insect quarantine (population growth of introduced species)
  • Fisheries (prediction of fish dynamics). A typical exponential curve is shown below:
            The doubling time is the period of time required for a quantity to double in size or value. It is applied to population growthlibrary collection, number of universities or colleges or students and many other things which tend to grow over time. When the relative growth rate (not the absolute growth rate) is constant, the quantity undergoes exponential growth and has a constant doubling time or period which can be calculated directly from the growth rate. This time can be calculated by dividing the natural logarithm of  2 by the exponent of growth, or approximated by dividing 70 by the percentage growth rate; that is:


D_t  =    ln⁡2/ln⁡〖(1+ r/100)〗    ≃   70/r

The doubling time helps us to understand  the long-term impact of growth than simply
viewing the percentage growth rate. The doubling time is a characteristic unit (a natural
unit of scale) for the exponential growth equation and its converse for exponential decay is the half life. For example with an annual growth rate of 4.8%, the doubling time is 14.78 years and a doubling time of 10 years corresponds to a growth rate between 7% and 7.5%(actually about 7.18%). Some doubling times calculated with this formula are shown in the following Table:
Doubling times Dt given constant r% growth

r%
Dt
r%
Dt
r%
Dt
r%
Dt
 0.1
693.49
 3.0
23.45
6.0
11.90
9.0
8.04
 0.5
138.98
 3.5
20.15
6.5
11.01
9.5
7.64
 1.0
69.66
 4.0
17.67
7.0
10.24
10.0
7.27
 1.5
46.56
4.5
15.75
7.5
9.58
15.0
4.96
 2.0
35.00
5.0
14.21
8.0
9.01
20.0
3.80
 2.5
28.07
5.5
12.95
8.5
8.50
20.0
3.80

Given the two measurements of a growing quantity, q1 at time t1 and q2 at time t2, and

assuming a constant growth rate, you can calculate the doubling time as


D_t  =  (t_2- t_1 )x  ln⁡2/ln⁡(q_2/q_1 )

The equivalent concept to doubling time for a material undergoing a constant negative relative growth rate or exponential decay is the half-life

The Calculation of Simple Percentage Growth rate

Page Contents 

The percent change from one period to another is calculated from the formula:

Where:
GR (%)   =    Percent Growth Rate
yt+1         =    Value at time (t+1)
yt            =    Value at time t
The annual percentage growth rate is simply the percent growth divided by N, the number of years.
Example
In 1980, the number of documents  in Library A  was 250,000. This grew to 280,000 in 1990. What is the annual percentage growth rate of library collection in Library A?

yt  =  250,000,    yt+10  = 280,000  and   N = 10

N       = 10

               =    1.2%

The library collection grew 12 percent between 1980 and 1990 or at an rate of 1.2 percent annually.



The Logistic Model

The Belgian mathematician Pierre Verhulst (1838) developed the Logistic model. A typical logistic curve is shown in Figure 6. He suggested that the rate of population increase might depend on population density51:

The algebra of the logistic family is something of a hybrid. It mixes together the behaviors of both exponentials and powers (proportions, like rational functions). The three parameters of the logistic family work together to produce its characteristic behavior, and they are best understood in combination. The parameters b and c are simply the y-intercept and the base of the component exponential function bc t. The rate at which a logistic function falls from or rises to its limiting value is completely determined by the exponential function in the denominator, by the parameters b and c.
Here, as observed in exponential function, one notices that logistic function depends on the population itself, i.e.,  Pearl and Reed have used the curve to describe the growth of albino rat and tadpole’s tail, the number of yeast cells, and most interesting of all, the number of human beings in a geographical area. In 1920, Raymond Pearl and Lowell J. Reed developed it independently. The curve is therefore also known as Pearl and Reed curve. A typical logistic curve is shown in Figure 1. The curve depicts three things:
  1. Show growth in the early stage.
  2. Intermediate period of rapid growth
  3. An approach to maturity.


Alternate Text
Figure 3. A typical logistic curve.

The Power Function

It is a log-log or double log model. It is mathematically represented as:
            b                                                                                  
i.e., log y = a+ blog t
Sometimes, the power model is also represented as
             yt = a+ bt g
where a, b > 0. For 0 < g< 1, the function y takes a concave shape, but without an upper limit. For g = 1, the function y assumes a linear shape. For g > 1 the function y takes a convex shape. A typical Power model is shown in Figure 4.
Alternate Text

The Gompertz Model

The Gompertz (1825) model describes a trend in which the growth increment of the logarithms is declining by a constant percentage. Thus, the natural value of trend would show a declining ratio of increment, but the ratio does not decrease by either a constant amount or a constant percentage. The equation for the Gompertz curve is

            i.e., log y = log a + (log b) ct
            The Gompertz and logistic curves are similar in that they both can be used to describe an increasing series, which is increasing by a decreasing percentage of growth, or a decreasing series, which is decreasing, by a decreasing percentage of declines. They differ in that the Gompertz curve involves a constant ratio of successive first differences of the log y values, while the logistic curve entails a constant ratio of successive first differences of 1/y values. The other equally well-known model is Ware’s model. It is represented as
            y =  d (1-f-t)        d, f > 1.

Alternate Text
            Figure 5. A Typical Ware’s Model.
Selecting a trend type
            There are many trend types. It is difficult to decide which one to use. Following are the simple guidelines to select the appropriate curve (Croxton and Cowden) :
  1. If the approximate trend, when plotted on semi-logarithmic paper is strait line, use an exponential curve.
  2. If the first differences resemble a skewed frequency curve, use a Gompertz curve
  3. If the first difference resembles a normal frequency curve, use a logistic curve.
  4. If the first differences of logarithms are constant, use an exponential curve.
  5.  If the first differences of logarithms are changing by a constant percentage, use a Gompertz curve.
  6. If the first differences of reciprocals are changing by a constant percentage, use logistic curve.
  7. It the approximate trend value (or the original data), when expressed, as percentage of a selected asymptote, appears linear on arithmetic probability paper, use a logistic curve.
Egghe and Rao (1992) have suggested an innovative methodology for identification and classification of growth models. They have classified the growth models based on growth functions, i.e. α1 and α2. They have denoted the growth function as C(t) (theoretical or concrete data). These growth rate functions may be defines as:
                              α1(t) = C(t+1)/C(t)  and  α(t) = C (2t)/C(t)                  
                                                                                                t = 1,2,3, ….
            αis called the first growth rate function and α2 is called the second growth rate function. The basic idea here is that the graph of αand α2 are much more different than the corresponding graphs of other growth models. The theoretical relationship between αand α has been worked out to be:
                                    α2(t) = α1(2t-1) α1(2t-2) …. α1(t)           
According to them, the method of determining the growth goes back to the intrinsic growth rate properties of the data for a good understanding of what is really going on. The authors have also suggested that in order to get a simple clue for the selection of the best model, the plot of two growth rate functions for different mathematical models (namely; exponential, power, linear, logistic and Gompertz) may be drawn and visualized. These graphs can be classified as : Type 1 – increasing, Type 2 – Constant, type 3 – decreasing, Type 4 – increasing and then decreasing, as shown in Table1.
Table 1. Classification of growth models using growth rate function
Types of model
Growth rate function
α1
α2
(1)
(2)
(3)
Exponential
Type 2
Type 1
Logistic or Gompertz (0<b, c<1)
Type 3
Type 4
Gompertz (b, c>1)
Type 1
Type 1
Power (α >0, 0< g £ 1)
Type 3
Type 1
Power (α >0, g >1)
Type 4
Type 1
Power (α > 0)
Type 3
Type 2


Other Related works

Crane (1972) found linear growth pattern in the growth of literature in two sub-fields, “invariant theory (1887-1941)” and “reading research (1881-1957)”. Tague et.al (1981) explored the cumulative growth of literature as reflected in Chemical Abstract (1907-79); Science Abstract (1960-79) and Biological Abstract (1960-70). They found that the linear growth pattern fits best in the majority of literature covered. The exponential growth model fits the best in literature covered by the Chemical Abstract. (1907-1979).
May (1966) studied the growth of mathematical literature since 1986 to 1965. He observed that this growth follows an exponential growth, with only few deviations observed during the world wars. Menard (1974) had examined the literature in various sub fields of earth sciences. He found that in the field vertebrate paleontology, which started as discipline in sixteenth century, grew slowly until the end of eighteenth century, and then began to grow exponentially. Wolfram et.al (1990) explored the Linear, Exponential and Power model to the growth of publications in a period of 20 years, as reflected in the databases belonging to science, technology, social science and humanities. They found that, in most cases, the mathematical model that provided the best fit to the observed data was a power model, rather than an exponential, logistic or a linear model; and they concluded, “The breakdown in exponential growth is well underway. The power model was in particular best, because it has the advantages of modeling the growth behavior of both the linear and exponential models.
Egghe and Rao (1992) clarified the formal distinctions between the four models that Wolfram et al. examined, pointing out that any linear model should more properly be recognized as a power model of a special kind, and introducing two other comparable models, the Gompertz and Ware functions, that Wolfram et al. did not consider. Revisiting the data collected in the earlier study, Egghe and Rao  observed that an exponential model was never the best fit. Indeed, they have shown that such a model could never have been expected to provide the best fit, given that the rate of growth in every database declined steadily over the years studied.  They also found that a power model fitted best in cases of convex growth and that a Gompertz model generally fitted best in cases of S-shaped growth. Egghe and Rao’s findings suggested that, in modeling the growth of literature, the choice between an exponential and a logistic function may have always been a false one, and that we should instead be asking whether growth is best described by a power law or a Gompertz function. Gupta, Kumar, Sangam, and Karisiddappa (1999) found best fits with either power or logistic models, and called into question the utility of the Gompertz function for modeling S-shaped growth in the social sciences. Gupta, et.al have applied this methodology in the application of selected growth models to the growth of World and Indian Physics literature during 1898-1950. They observed that growth of Indian Physics literature follows a logistic growth model, while the growth of world physics literature can be explained by the combination of logistic and power models. Gupta and Karisiddappa (2000) introduced different approaches for studying the growth of scientific knowledge, as reflected through publications and authors. They applied selected growth models to the cumulative growth of publications and authors in theoretical population genetics from 1907 to 1980. They concluded that among the models studied, the power model is the one which best explains the cumulative growth of publication and author counts in theoretical population genetics.
            Sharma, et.al (2002) have examined the growth of world literature as reflected in three data sets, namely Physics Abstracts, Chemical Abstracts and Electrical and Electronics Abstracts from 1907-1994. They found that the power model describes best the growth of literature as reflected in the three data sets.             Tsay and Yang (2005) studied the randomized controlled trial (RCT) literature and he observed that it plays a fundamental role in developing Evidence-based medicine (EBM). They found that the literature growth rate, from 1965 to 2001, is steadily rising and follows an exponential model. In another paper earlier, Tsay and Young (2003) studied the growth pattern, journal characteristics, and author productivity of the subject indexing literature from 1977 to 2000, based on the subject search of a descriptor field in the Library and Information Science Abstracts (LISA) database. The literature growth from 1977 to 2000 in subject indexing could be fitted well by the logistic curve.
            Matia et.al (2005) analyzed a set of three databases at different levels of aggregation: (a) a database of approximately 106 publications published from 1980-2001, (b) a database of 508 academic institutions from the European Union (EU) and 408 institutes from the United States for the 11-year period of 1991-2001, and (c) a database of 2,330 Flemish authors published in the period from 1980-2000. At all levels of aggregation they found that the mean annual growth rates of publications is independent of the number of publications of the various units involved. They also found that the standard deviation of the distribution of annual growth rates decays with the number of publications as a power law with exponent approx. 0.3.  
Jing and Kang (2000) analyzed three classical models for the growth of scientific literature: the exponential, logistic and the linear growth model. They also discussed the limits and scope of the logistic model. Ramakrishna and Pangannaya (1999), Examined the “Derwent Biotechnology Abstracts” and a journal “Animal Cell Biotechnology” with objectives to study the relative growth rate (RGR) and doubling time of animal cell culture technology literature between 1983 and 1993. They found a reducing trend of RGR and increasing rate of doubling time. It indicates that the growth is neither exponential nor linear. The size of the literature was calculated by applying the logistic growth formula.
                        Maheswarappa and Ningoji (1993) studied the growth of literature in the field of applied sciences in India based on `Indian Science Abstracts' from 1965 to 1989. They observed that the relative growth rate has been declined and it was heading towards saturation. The doubling time of literature has consistently increased. The growth of literature in the field of applied sciences in India does not fit with the modified exponential, logistic and linear curves. Parvathamma, Gunjal and Nijagunappa (1993) conducted a study in which the growth rate of Indian earth science literature was determined by calculating relative growth rates and doubling time for the period 1978-88. Results showed that the mean relative growth rate has declined whilst mean doubling time has increased. They suggested that Indian earth science literature follows a logistic pattern of growth. 

Relationship between Growth of Sources (Journals) and Items (Articles)

Archibald and Line (1990) examined the belief that scholarly and scientific journal literature is growing exponentially, a study was made of a sample of 190 journals that started before 1950, 20 in each of 9 subject fields and 10 in literature. The number of articles in each journal between (1950-80 and 1987) was also counted. The analysis showed a rapid growth in most subjects up to 1970, a much slower growth between 1976-80, and a slow growth or decline between 1980 and 87. In general and physical science and technology, the growth rates were declined from 1980 to 1987. Although the total number of journals is increasing, the overall rate of growth of the total number of journal articles is slow. Persson, Glänzel and Danell (2003)presented interesting “universal” data (i.e. based on all papers indexed in the volumes 1980-2000 of the Science Citation Index® (SCI)). They observed that, between 1980 and 1998, the number of articles has increased by (roughly) one third whereas the number of citations received by them has increased by three quarters.
Egghe (2005) observed that growth rates of sources (journals) usually are different from growth rates of items (articles); further he argued, “The references in publications grow with a rate that is different (usually higher) from the growth rate of the publications themselves.” His study showed that Naranan’s model (exponential model: y=act) hardly fits the empirical data. He showed that the "simple" 2-dimensional informetrics models of source-item relations are not able to explain this. He has further shown that a linear 3-dimensional informetrics (i.e. adding a new source set) is capable to model disproportionate growth. The explanation consists of “defining” a set of “super sources” which produce the original sources but which also attach the items into the original sources. In this way, disproportionate growth of references versus articles can be explained by looking at authors. In the same way, disproportionate growth of articles versus journals (a new dataset is compiled from the database Econlit) can be explained by considering journal publishers.  Formulae of such different growth rates are presented using Lotkaian informetrics and new and existing data sets are presented and interpreted in terms of the used linear 3-dimensional model.
Sahoo (2006) in his thesis compared the growth of the journals with that of the articles, in the area of software studies. He observed that the correlation coefficient ( r ) between the number of journals and the articles is 0.9811. That is, 96% of the variation in y (articles) is due to the variation in x (journals) – higher the number of journals, the higher the number of articles. However, for the case of World literature he has observed that the correlation coefficient between number of journals and the number of articles published is only 0.8517. Unlike India, the correlation is not so high for the world literature. Figures 6 and 7 show the growth rate curves for journals and articles published by the journals for India and World literature




Alternate Text

Alternate Text

Figure 6 and 7. Growth rate of Journal Vs Article for  India and World literature

respectively from 1990 to 2003. It has been observed that the growth rates of journals have been decreased with the decrease of growth rate of articles. There is negative growth during 1990, 1992, 1995, 1998, and 2001 for India; for world literature it is during 1991, 1992, and 1998.  Some other years like 1995 and 2001 for India, it has been observed that when there is a negative growth in journals, it does not show any negative growth for the article. From these observations,  Sahoo concluded that only in some cases the growth of journals affect the growth of the number of articles. The growth rate of journals and articles are not same;  for the Indian literature average growth rate is 9% for both journals and the article and   for the world literature the average growth rate of journals is 3% and average growth rate of articles are 9%.

             In another study, Ravichandra Rao and Divya (2010) studied the growth of literature in Malaria Research. They also studied the relation among journals, articles and authors. Their study suggests that the journals, articles, and authors increase approximately exponentially. The number of articles has increased from 3,996 to 57, 627 from 55-65 to 96-05. Also the number of journals has been increased 503 to 3,072 from 55-65 to 96-05. The R2 value for the trend for journals, articles, and authors are 0.9502, 0.9475 and 0.9651 respectively; the low R2 value are perhaps due to the less number of data sets; the Figures 8-10 are much more convincing that the data on journals, articles, and authors increase exponentially. Under the assumption that the data confirm to exponential model, the growth rates have been computed; the growth rates of the journals, articles and authors are 5.31%, 7.38%, and 10.06% respectively. The most important observation is that the number of least productive journal has been increased to 2,951 fro 463. This perhaps due to

  • Interdisciplinary nature of research in Malaria and related topics
  • May be an incomplete bibliography
  • High growth rates (exponential in nature!) of journals and articles

Alternate Text


Alternate Text
An attempt was also made to study the relation among the three variables – journals, articles and authors. A multiple regression analysis was carried out to study and understand the relation. The following three regression equations were identified:

1)                  Articles = -39.2771 + 3.61719*journals + 0.085882*Authors (R2 = 99.1616%)
2)                  Journal = 84.7129 + 0.136999*Articles + 0.003053*Authors (R2 = 98.6554%)
3)                  Authors = -2873.51 + 4.4767*Articles + 4.20186*journal (R2 = 98.3593 %)

With regard to the first linear equation, the p-value of the independent variables (journals and authors) is 0.0000. Since it is less than 0.01, the highest order term is statistically significant at the 99% confidence level. Consequently, there is no need to remove any variables from the model!

With regard to the first linear equation, the p-value of the independent variable ( -- authors) is 0.4335.  Since it is greater than 0.01, the term is not statistically significant at the 90% or higher confidence level. Consequently, the variable author may be removed from the model! i.e., The equation journal = 84.7129 + 0.136999*Articles  is good enough to use as a linear model.

With regard to the first linear equation, the p-value of the independent variable ( -- journals) is 0.4335. Since it is greater than 0.01, the term is not statistically significant at the 90% or higher confidence level. Consequently, the variable journal may be removed from the model! i.e., The equation Authors = -2873.51 + 4.4767*Articles is good enough to use as a linear model.
An attempt was also made to analyze the chemical literature based on the data from Chemical Abstracts. The Figure 11 shows the growth trend with its R2. The growth rate is 5.98.


Alternate Text

Obsolescence of literature

‘Obsolete’ generally means out of date or no longer in use. The process of being obsolete is known as obsolescence. It is also often referred to as ‘phenomenon of replacement.’ The term obsolescence is used for the first time by Gross and Gross in 1927. The authors analyzed the references in the 1926 volume of the journal of chemical literature and observed that the number of references falls to one-half in fifteen years. Obsolescence is thus a characteristic of scientific and technical literature.  Burton and Kebler (1960) are the first to use the term ‘half-life’ in 1960. It is defined as ‘the time during which one-half of all the currently active literature published.’ It is the period of time needed to account for one-half all the citations received by a group of publications. The concept of half-life is always discussed in the context of diachronous studies. More precisely, Line and Sandison  (1974) refer to diachronous studies in those that follow the use of particular items through successive observations at different points in time, where as synchronous studies are concerned with the plotting the age distribution of material used at one point in time.
However, there is no reason suppose that the half-life for some subject is the same as the median citation age in that subject. Half-life in the context of synchronous data is referred to as median age of the citations / references. The use of literature may decline much faster  with data of ephemeral relevance, if it is in the form of reports, thesis, advance communication or pre-print and in the context of advancing technology. However, the use of literature may decline slowly when it is descriptive (e.g., taxonomic botany) and critical (e.g., literary criticism); it may also decline if it deals with concepts (e.g., philosophy).
Brookes in one of his articles (1970 ) argues that if growth rates of literature and contributors are equal then the obsolescence rate remain constant. In this sense growth and obsolescence are related. Ravichandra Rao and Meera (1991) have studied the relation between growth and obsolescence of literature, particularly in mathematics. Gupta (1998) studied the relationship between growth rates and obsolescence rates and half-life of theoretical population genetics literature. He explored the application of lognormal distribution to the age distribution of citations over a period of time.
In the analysis of obsolescence, Brookes argued that the geometric distribution expresses the idea that when a reference is made to particular periodical of age t years (1-a) at-1 . ‘a (< 1)’ is a parameter – the annual aging factor; it is assumed to be constant over all values of t. Let  U  =  1 + a2 + a+ a4 + …. + at + ….   i.e., U  = 1/(1-a).  Similarly if U(t)  =  at + at+1 +  at+1  at+2 +  ……   =  at (U(0), then   U(t)/U(0)   =  at. Using this relation, by graphical method, we can compute half-life as well as ‘a’.
            If we assume the literature is growing exponentially at an annual rate of g, we then have R(T)  = R(0)egT, where R(T) is the number of references made to the literature during the year T.  We also have
            U(0)  =  R(0)/(1-a0)  and U(T)   =   R(T) )/(1-aT
Where a0 and   aT are the annual aging factors corresponding to the years 0 and T respectively. Under the assumption that utility remains constant (U(0) = U(T)) , we then have
R(0)/(1-a0)  =  R(T) )/(1-aT)
By substituting the value of R(T), we thus have a relation between the growth and the obsolescence: 
                                    egT  =  (1-aT)/(1- a0)

However, Egghe and Ravichandra Rao (1992) showed that the obsolescence factors (aging factors) ‘a’ is not a constant, but merely a function of time. The authors have also shown that the function ‘a’ has a minimum which is obtained at a time t later than the time at which the maximum of the number of citations is reached.

            Egghe (1993) also developed a model to study influence of growth on obsolescence. He found different results for the synchronous and diachronus study. He argued that for an increase of growth implies an increase of the obsolescence for the synchronous case and for the diachronous case, it is quite opposite. In order to derive the relation, he also assumed the exponential models for growth as well as for obsolescence. In another paper, for the diachronous aging distribution and based on a decreasing exponential model, Egghe derived first citation distribution. In his study he assumed the distribution of the total number of citations received confirms to a classical Lotka’s function. The first citation distribution is given by
                        f (t1)  = g (1- a t1)a-1
where g is the fraction of papers that eventually get cited; t1 is the time of the citation, ‘a’ is the aging rate and a is Lotka’s exponent.  Egghe and Ravichandra Rao in their study in 2002 observed that the cumulative distribution of the age of the most recent reference distribution is the dual variant of the first citation distribution. This model is different from the first citation distribution. In another study, Egghe and Rao have shown the general relation between the first citation distribution and the general citation age distribution. They have shown that if Lotka’s exponent a = 2, both distributions are the same. In the same study, they have argued that the distribution of nth citation is similar to that of the first citation distribution. Egghe, Rao and Rousseau studied the influence of production on utilization function. Assuming an increasing exponential function for production and a decreasing one for aging, the authors have shown that in the synchronous case, the larger the increase in production, the larger the obsolescence; however, for the diachronous case it is quite opposite. This proof is different from the earlier one derived by Egghe (1993).
            Most of the bibliometric studies are empirical in nature. In such circumstances, to reproduce bibliometric research, one has to repeat the survey and analyze the data right from the beginning. Even then, we may not get the same result! In natural sciences, it is possible and quite common that research may be repeated in laboratories. But in social sciences such a thing is not only difficult, but is not possible. Further, an important cause of over all unreliability and therefore a cause of invalidity in any basic research in social science are due to “small sample” size. If the study is based on, large sample, it is difficult to reproduce the result of research.
            Under the circumstances, if the study is based on certain guidelines/ methodologies, one may accept its validity  as well as its generalization of the result. Also, if the scientific methods are followed, we may satisfy ourselves with certain statistical parameter-- measures of central tendency, measures of dispersion, confidence limits, etc. the general guidelines are:
  1. Identify the general problem(s)
  2. Conduct literature search
  3. Decide the design methodology
  4. Collect the data either for the population or for a sample
  5. Analyze the data
  6. Report the result
  7. Refine the hypotheses

(Note Steps 2 to 8 may have to be repeated)

In Step 1, the research objectives are explicitly identified and described. Further, information about the research objectives and investigative tasks are analyzed and relevant terms and variables are defined. The research question is stated and/or hypotheses are formulated in Step 3 with a clear-cut definitions, assumptions, suppositions, etc. The objectives of the study are then carefully observed and, if necessary, causal factors are associated with the observed phenomenon are identified. If sufficient data are collected from machine-readable databases, results are much more reliable than otherwise. Thus whenever possible, one may use, databases for collecting the data. However, investigators are more certain when they select and conduct similar studies (i.e., conduct follow-up case studies in the same area.)


Questions

  1. How to compute doubling time period, growth rate relative growth rate?
  2. Explain and derive the exponential model. What are the assumptions we have, in deriving the exponential model?
  3. For a given set of a data, how will you choose a model to fit the observed data.
  4. Explain the methodology to fit a growth model to a given set of data.

References


  1. Archibald, G. and Line, M.B. (1990.) “The size and growth of serial literature 1950-1987, in terms of the number of articles per serial.”(Scientometrics. 20,; 173-196.)
  2. Brookes, B C. (1970.) “Obsolescence of special library periodicals: Sampling Errors and Utility contours” (Journal of the American Society for Information Science. 21; 320-9.)
  3. Burton R E Kebler R W. (1960.) “The half-life of some scientific and technical literature”. (American Documentation. 11; 18-22.)
  4. Crane, D. (1972.) “Invisible colleges: diffusion of knowledge in scientific communities”. The University of Chicago Press. Chicago.
  5. Croxton, F.E. and Cowden, D.J. (1966.) “Applied general statistics”. Prentice-Hall of India (Private) Ltd. New Delhi.
  6.  “Definition of Gross domestic product.” <http://www.wordiq.com/definition/Gross_domestic_product>.
  7. Donella H. Meadows, Dennis L. Meadows, Jorgen Randers, and William W,. Behrens III. (1972.) The Limits to Growth.” Club of Rome.
  8. Egghe, L. (1993.) “On the Influence of Growth on Obsolescence”. (Scientometrics, 27,2; 195-214.)
  9. Egghe, L. (2005.) “An explanation of disproportionate growth using linear 3-dimensional informetrics and its relation with the fractal dimension.” (Scientometrics. 63,2; 277-296.)
10.  Egghe, L. (1990.) “The Duality of Informetric Systems with Applications to the Empirical Laws”. (Journal of Information Science, 16; 117-27. (Also see Introduction to Informetrics: Quantitative Methods in Library, Documentation and Information Science by L Egghe and R Rousseau. (Elsevier. Amsterdam. 1990.)
11.  Egghe, L. and Rao, I.K.R. (1992.) “Classification of growth models based on growth rates and its applications.” (Scientometrics 25; 5-46.)
12.  Egghe, L and Ravichandra Rao, I.K. (1992.) “Citation age data and the obsolescence function: fits and explanations.” (Information Processing and Management, 28,2; 201-17)
13.  Egghe, L and Ravichandra Rao, I.K. (1995.) “On the influence of production on utilization functions: Obsolescence or increased use?” (Scientometrics. 34,2; 285-315)
14.  Exponential Model. <http://www.ento.vt.edu/~sharov/ PopEcol/lec5/exp.html>.
15.  Gompertz, B. (1825) “On the Nature of the Function Expressive of the Law of Human Mortality, and on a New Mode of Determining the Value of Life Contingencies.”  (Philosophical Transactions of the Royal Society of London 115 513-585.)
16.  Gross, P L K and Gross, E M. (1927.) “College Libraries and chemical education”. (Science, 66; 1229-34.)
17.  Gupta, B.M. (1998.) “Growth and obsolescence of literature in theoretical population genetics.” (Scientometrics. 42,3; 335-47.)
18.  Gupta, B.M.; Sharma, Praveen and Suresh Kumar. (1999.) “Growth of world and Indian literature.”  (Scientometrics. 44; 5-46.)
19.  Gupta, B.M. and Karisiddappa, C.R. (2000.) “Modeling the growth of literature in the area of theoretical population genetics.” (Scientometrics. 49,2; 321-55.)
20.  Gupta, B.M., et al. (2002):  “Modeling the growth of world social science literature.” (Scientometrics. 53; 161-164.)
21.  Jing, P. and Kang, Z. (2000.)  “On the mathematical models of the growth of scientific literature.” (Journal of the China Society for Scientific and Technical Information. 19,1; 90-6.)
22.  Maheswarappa, B. S. and Ningoji, M.M. (1993.) “Growth of literature in the field of applied sciences in India (1965-89).” (International Information Communication and Education. 12,2; 191-200.)
23.  Line (Maurice B) and Sandison (A) (1974.) Obsolescence and change in the use of literature with time (Journal of Documentation. 30,3; 283-350.)
24.  Logistic Functions. <http://www.wmueller.com/precalculus/ families/1_81.html>.
25.  Mabe, Michael. (2003.)“The growth and number of journals.”(Serials.16,2;191-7.)
26.  Malta, Kaushik, et al. (2005.) “Scaling phenomena in the growth dynamics of scientific output.” (Journal of the American Society for Information Science and Technology. 56,9; 893-902.) 
27.  Meadows, A.J. (1993) In Woodward and Pilling. The International Serials Industry. Aldershot, Gower, pp 24-7.
28.  Meadows, A.J. (1998) “Communicating Research. Academic Press. London and San Diego Press. Pp. 15-6.
29.  May, K.O. (1966.)  “Quantitative growth of the mathematical literature.”  (Science. 154; 1672-1673.)
30.  Menard, H.W. (1974.) Science: growth and change. Harvard University Press, Cambridge, Mass.
  1. Naranan, S. (1970.) “Bradford’s law of bibliography of science: an interpretation.” (Nature. 227.5258, 631-632.)
32.  Neelameghan A. (1963.) Documentation of the History of Medicine in India. (Ann Lib Sc. Doc. 10,3/4; 116-42.)
33.  Oliver, M.R. (1971.) “The effect of growth on the obsolescence of semiconductor physics literature.” (Journal of Documentation. 27; 11-17.)
34.  Parvathamma, N.; Gunjal, S.R. and Nijagunappa, R. (1993.) “Growth pattern of literature and scientific productivity of authors in Indian earth science (1978-88): a bibliometric study.”(Library Science with a Slant to Documentation. 30.2; 54-64.)
  1. Persson, O., Glänzel, W. and Danell, R. (2003). “Inflationary bibliometric values: the role of scientific collaboration and the need for relative indicators in evaluative studies.” Proceedings of the ninth International Conference on Scientometrics and Informetrics, Beijing (China), Jiang Guohua, R. Rousseau and Wu Yishan, eds.,411-420, Dalian University of Technology Press, Dalian (China).
36.  Price Derek De Solla. (1963.) “Little Science, Big Science”. Columbia University Press. New York (GS201.)
37.  Ramakrishna, N.V. and Pangannaya, N.B. (1999.) “Growth of animal cell culture technology literature: a correlation between citations and publications based on growth curves.” (Library Science with a Slant to Documentation and Information Studies 36,1; 21-6.)
38.  Rao, I.K.R. and Meera, B.M. (1991.) “Growth and obsolescence of literature : an empirical study.” In I.K.R. Rao ed. Informetrics – 91. Sarada Ranganatahan Endowment for library. Bangalore. pp-377-394.
39.  Robert Malthus, Thomas. (1798.) “An essay on the Principle of Population.”
40.  Rose, S. (1967.) “The S-curve considered.” (Technology and Society 4, 33-40.)
41.  Rose, H. and Rose, S.  Roses (1969.) criticism of price’s (and other) attempt to formulate laws of growth from the point of view of the effects of such work on science policy. (Science and society London: Penguin.)
  1. Sahoo, Bibhuti Bhusan.(2006). “Scientometric Study of Literature in Software Studies in India with a comparison to the World Literature.” Thesis Submitted to the Dept. of Library and Information Science of the Pune  University for the Degree of Doctor of Philosophy in Library and Information Science. Guide: I. K. Ravichandra Rao, Documentation Research and Training Center, Indian Statistical Institute, Bangalore.
  2. Sharma, Praveen; Gupta, B.M. and Suresh Kumar. (1993.) “Application of selected growth models to growth of science and technology literature in research specialties.”  (Scientometrics.)
  3. Tague, J.; Beheshti, J. and Rees-Potter, L. (Summer 1981): “The law of exponential growth: evidences, implications and forecasts.” (Library Trends;  125-149.)
  4. Tsay, Ming-Yueh. (2004.) “Literature growth, journal characteristics, and author productivity in subject indexing, 1977 to 2000.” (Journal of the American Society for Information Science and Technology. 55,1; 64-73.)
  5. Tsay, Ming-Yueh and Yang, Yen-Hsu. (2005). “Bibliometric analysis of the literature of randomized controlled trials.” (Journal of the Medical Library Association (JMLA ) 93,4.)
  6. Tsay, Ming-Yueh and Yang, Yen-Hsu. (2003):  “A study on the literature growth and author productivity of randomized controlled trials medical literature.” (Journal of Information, Communication and Library Science 9.3 (chased 4/6/2004 Mar.2003):  31-46.)
  7. Wolfram, D.; Chu, C.M. and Liu, Xin. (1990): “Growth of knowledge: Bibliometric analysis using online databases data in L. Egghe and R. Rousseau, eds.) Informetrics 89/90 355-372.
49.  Yamazaki, Shigeaki.  (1987.)“A critical review of some ecological studies on scientific journals [in Japanese].” (Journal of Information Processing and Management 29,10 ; 863-870.)
50.  Yoshikane, F., and others. (2006). “Comparative analysis of co-authorshipnetworks considering authors' roles in collaboration: differences between the theoretical and application areas.”( Scientometrics, 68,3; 643-655.)

No comments: