Tuesday, November 26, 2013

06. Bradford Distributions: An Overview P- 07. Informetrics & Scientometrics

इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com

06. Bradford Distributions: An Overview


P- 07. Informetrics & Scientometrics *

By :I K Ravichandra Rao,Paper Coordinator

06. Bradford Distributions: An Overview  

Home
 Content

 

Objectives


After going through this case study you  will come to know about the following:
a)      Derivation of equations for Bradford distribution  by various bibliometricians.
b)      Viewpoints of some bibliometricians on the law.
c)      Ambiguity between verbal and graphical representations of Bradford distribution.
d)      Bradford-Zipf distributionww
e)      Characteristic s of bibliometric distribution, etc.

Summary

Bradford for his law provided two formulations – one verbal and the other graphical. In 1962, Cole suggested one formulation in the form of an algebraic expression. In 1967 Leimkuhler derived an equation for the law using statistical techniques. The determination of a constant in  the equation was found to be tedious. This led Brookes to derive a simple equation. He derived basically two different equations – one for the verbal formulation and the other for the graphical formulation. Bradford’s verbal formulation had some limitations  as it allowed only partial graph to be plotted. The entire data presented through  the graph gives the complete phenomenon of scatter. The graph from a gentle curve develops   gradually into  a straight line. In some cases the graph produces a droop at the top end. Naranan maintained  that Bradford’s law is explainable  in terms of an underlying power law distribution;  the law emerges as a natural consequence of exponential growth of scientific literature and journals at comparable rates; and a model like this predicts a strong correlation between the age of a journal and the number of articles it carries. Brookes argued that Naranan’s analysis was not valid for Bradford. Hubert was also of the view that Naranan interpretation of the original form of Bradford’s law does not follow a stochastic argument based on his assumptions. However, his paper provided a plausible model of Lotka’s law with suitable verbal amendments. Analyzing all the classical laws of bibliometrics Bookstein concluded that all these distributions are mainly the different versions of a single theoretic distribution. Ravichandra Rao analyzed the Bradford multiplier with a small sample of 12 datasets using t test, and made an  attempt to identify a suitable model to explain the law of scattering. It was observed that log normal fits much better than many models including the log linear model. Vickery in1948 showed that there is disparity in the verbal and graphical formulations of Bradford. He provided the expressions for both. It has also been shown that the two formulations generate two different graphs, one is incomplete and the other complete. In the graph of verbal formulation certain features of the distribution are missed. Kendall showed that the data of Bradford distribution is amenable to be presented in a way that it shows Zipf distribution. Bradford and Zipf distributions are in fact very close.  The linearity of the Bradford bibliograph indicates a true Zipf situation. The characteristics of the bibliometric distributions have been highlighted.

Cole's Formulation

Bradford stated that “If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals  more particularly devoted to subject, they  and several groups of zones  containing the same number of articles as the nucleus, when the zones will be 1:n:n2 ….” Later, Cole [6] also experimented with the law and named the slope of the curve as the reference scattering co-efficient and concluded that the coefficient might be the characteristic of the subject field. For petroleum literature, Cole obtained the relationship
F(x) = 1+ b log10 x                             (x > c)
where F(x) stands for the cumulative number of papers contained in the x number of most productive journals, and c represents the number of journals figuring in the nuclear zone. For petroleum literature Cole found the value of  b as 0.43.

Leimkuhler’s Formulation

Ferdinand F. Leimkuhler [11],  Professor of Industrial Engineering of the  Purdue University of  the United States analysed all the published data on Bradford distribution and derived the following equation applying statistical techniques.

                                                                      ln(1+bx)
F(x) =-----------------, (0£ x £ 1)                          
ln (1 + b)

In the equation,  F(x) stands for  the cumulative fraction of the references, x for the corresponding fraction of the most productive journals, and b is a constant related to the document collection. Brookes [3] opined that the equation is but a compromise since Leimkuhler  accepted the empirical data as both complete and exact.  Brookes further observed that             “Unfortunately, though Leimkuhler’s formulation can be used theoretically without difficulty, it has some disadvantages for the practical documentalist. The numerical evaluation of the key parameter b requires tedious statistical computation and the solving of an implicit equation by approximation methods....In fact it was the exasperation evoked by an attempted practical application of  Leimkuhler’s formulae that led the author of this paper to seek a simpler formulation of the Bradford distribution” [3:  p249]

Brookes Formulation

Brookes [3] formulation of the Bradford distribution follows.  Suppose R(n) is the cumulative total of relevant papers found in the first n journals when all the journals are ranked in order of decreasing productivity, the Bradford’s law requires that
                                    R(n) = R(n2) - R(n) = R(n3) - R(n2) = ....
for all integral values of n greater than unity.

            We have          R(n) = R(n2) - R(n)                                                                  Eq. 1
                                    Þ R(n) + R(n) = R(n2)
                                    Þ 2R(n) = R(n2).

            Again ,we have            R(n2) - R(n) = R(n3) - R(n2)                                          Eq. 2
                                    Þ R(n2) + R(n2) - R(n) = R(n3)
                                    Þ 2 R(n2) - R(n) = R(n3)
                                    Þ 2. 2 R(n) -R(n) = R(n3)                   [R(n2) = 2R(n) from Eq. 1]
                                    Þ 4R(n) - R(n) =R(n3)
                                    Þ 3R(n) = R (n3)
            We have          R(n2) =2R(n)
                                    R (n3)  = 3R (n)
Hence, we can generalize        R(nx) = xR(n), where x is a positive integer

                                                      1
                                    Þ R(n) =  -- R(nx)
                                                      x
Putting 1/x =a, and   R(nx) = nwe can  write the equation as
                                    Þ R(n) = anb    [ 1 £ n £ c]                                           Eq. 3
where c is the value of n at the point where the straight portion of the curve begins [Fig. 1]

The only function that fully satisfies this condition is
                                    R(n) = k log n, where k is a constant                            Eq.4

Brookes has provided another model
                                                             n
                                    R(n). = N log  ---- [ c £ n £N]                                       Eq.5
                                                              s
where   N =Total number of periodicals expected to publish paper on the subject, and
             s = Value of n at the intersection of the straight portion of the curve with the X axis [Fig. 1].

Now, let us take Eq. 3 and find out the values of a and b
In Table 1, we find that the first 20 periodicals account for 590 articles, and the first 30 periodicals account for 689 articles. In the first case. utting these values into the Eq. 3 we get

                                    590 = a. 20b                                                                 Eq. 6
                                    689 = a. 30b                                                                 Eq. 7
Dividing  Eq. 7 by Eq. 6, we get
                                    689/590 = 30b/20b
Taking log we get                log 689 - log 590 = b log 30 - b log 20 = b (log 30 - log 20)
                             Þ 2.8382 - 2.7709  = b (1.4771 - 1.3010)
                                         Þ 0.0673 = b (0.1761)
                                                Þ   b = 0.382
Now, putting the value of  b in Eq. 6, we get
                                    590 = a. 20.382
Taking log we get          log 590 = log a + .382 log 20
                        Þ      2.7709 = log a +  .382x 1.3010
                        Þ      2.7709 = log a + .4970
                        Þ         log a = 2.7709 - .4970 = 2..2739
Taking antilog of 2.2739 we get 187.9 i.e. 188

The  values we get are a = 188, and b = .382

With the values of a and b we can now determine the value of  the cumulative  total of references for the periodical of any rank.

For testing, let us take 45th rank.

We have                      R(n) = anb
Putting the values , we get R(n) = 188 x 45.382 
                                                            = 188 x 4.276
                                            = 803.888
                                            =804, which is quite close to the observed value of  802.

Bradford considered the bibliograph to be a straight line which has resulted in two different formulations, one is verbal and the other is graphical. The algebraic expressions for the two formulations given by Brookes are:
R(n) = j log (n/t + 1) for the verbal formulation, and
R(n) = k log n/s for the graphical formulation .

Naranan’s Viewpoint

In 1970 Naranan [12]  opined that (i) Bradford’s law of bibliography of scientific literature  is explainable  in terms of an underlying power law distribution of the number of articles in scientific journals; (ii)  the law emerges as a natural consequence of exponential growth of scientific literature and journals at comparable rates; and (iii)  a model like this predicts a strong correlation between the age of a journal and the number of articles it carries. The author was hopeful that the proposed mechanism might  find wider application in many other fields of science.

Brookes [5] pointed out that Naranan’s analysis was not valid for Bradford. However, his paper provided a plausible model of Lotka’s law with suitable verbal amendments. The comments of Brookes as to the paper is being reproduced verbatim. “The inverse square law of scientific authorship has hitherto been regarded as an inexplicable and useless scientific oddity. Naranan’s model of it is therefore welcome. And, together with other measures of scientific productivity, Lotka’s law has recently been applied by Dobrov and Korennoi in determining the optimum size of research institutes in USSR”.

Hubert [ 9] was of the view that Naranan interpretation of the original form of Bradford’s law does not follow a stochastic argument based on his assumptions.  

Bookstein’s Viewpoint

In his paper published in 1976, Bookstein [1] analysed the distributions of Lotka, Zipf, Bradford and Leimkuhler and adopted a point of view that allows us to understand that these distributions are in fact the different versions of a single theoretic distribution. He generalized these distributions with the following words. “All of these distributions are almost equivalent . . . In each case we have a set of entities (for example, chemists, words) producing events (publications, occurrences) over some dimension of extension (time, length of text) and in each case the distribution describes the number of occurrences of events over a fixed interval of that dimension. Under these conditions it is possible to describe the same distribution in at least four distinct ways; these modes of description are represented above by the distributions of Lotka, Zipf, Bradford, and Leimkuhler”.

Physicists all over the world have tried to unify four natural forces, i.e. electromagnetic force, gravitational force, weak nuclear force, and strong nuclear force for the last hundred years or so. Bookstein has done the same thing for bibliometric distributions. He has shown that basically all the four bibliometric distributions are different versions of the same distribution.  

Bradford Multiplier

The number of periodicals in the three zones of Bradford distribution generally follows the ratio 1:n:n2, where  n is the Bradford multiplier. Ravichandra Rao [15]  analyzed the Bradford multiplier with a small sample of 12 datasets using t test. An attempt has also been made to identify a suitable model to explain the law of scattering. Among the various methods tried log normal fits much better than many models including the log linear model.

Ambiguity between Verbal and Graphical Staements

Bradford's law can be looked into two different ways -- graphically and verbally. This was first observed by Vickery [16]. Bradford’s verbal formulation of the law is recorded as “If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the number of periodicals in the nucleus and succeeding zones will be as 1: n: n2”.[2:  p. 154].  In 1948, Brian C. Vickery [16] contributed an important paper on Bradford’s law.  He analysed about 1600 journal references and compared his results with Bradford’s and found  an inconsistency. He remarked –“We can … regard the theoretical distribution of papers on a given subject in scientific periodicals as derived by Bradford, as fully corroborated by the distributions observed in the sample investigations. The rectilinear relation . . . incorrectly assumed by Bradford to be identical with his theoretically derived relation, fits only the upper portion of the observed curve (Figs. 2 and 3). The theoretical relation itself , however, enables us to predict the whole curve”.  Vickery  showed that if nm journals contribute a cumulative mpapers, and  nm  is greater than the nucleus, then the verbal formulation is equivalent to the expression  n: n2m - n: n3m - n2m: . . . :: 1: am: am2: . . . The graphical formulation is equivalent to the expression n: n2m : n3m : . . . :: 1: bm: bm2: . . .  This apart, when the graph is plotted with the data of verbal formulation  it takes a different shape compared to the shape of the graph with complete data set. In the verbal expression, the data in Table 1 will take the following shape and generate a curve as given in Fig. 2.

Table 2 – Distribution of Articles according to Zones

Zone
No. of Periodicals
            Cumulative number  of         Articles
1st Zone
10
445
2nd Zone
49
886
3rd Zone
267
1332
.  


Alternate Text
Fig. 2 Bradford’s bibliograph with verbal formulation
Y axis – Cumulative number of periodicals; X axis – Cumulative number articles

When the graph is drawn with entire data set  we find finer details which are lost in the graph of verbal formulation. Now let us see the graph with the entire data of Table 1


Alternate Text



Y axis – Cumulative number of periodicals; X axis – Cumulative number articles
Fig 3 Bradford’s bibliograph with complete data
Comparing the two bibliographs we find the following:

i)                    In the verbal formulation, the entire data is not available. What we get is practically a summary of the entire data set. 

ii)                  The bibliograph in Fig 2 is incomplete, inasmuch as it does not indicate the starting point of the curve.

iii)                The last portion of the graph in both Figs. 2 and 3 is a straight line.

iv)                In the graphical presentation of some Bradford distributions, a droop is observed at the end of the graph, which is not seen with the data of verbal formulation.    

With these, the distinction between verbal formulation and the graphical presentation becomes quite clear and the shortcomings of the verbal formulation apparent.  Brookes have provided equations both for verbal formulation as well as  the graphical formulation. The equations are given under Brookes’ formulation.


Bradford-Zipf Distribution

Kendall [10], a statistician by profession, also studied Bradford distribution using 1,763 references on operational research pertaining to 370 journals. For the sake of comparison ‘1465 references to statistical methodology (covering the period 1925-39)’ were used. The graph plotted following Bradford’s method produced a curve which was remarkable for its linearity. He also noticed that the Law is similar to, but not identical with the Zipf’s law. Let us consider the data given in Table 4. The data set provides the typical Bradford distribution.
                  Table 4– A data set following Bradford distribution.
Rank
No. of Periodical/s
No. of article/s
Cumulative total
1
1
20
20
2
1
14
34
3
1
12
46
4
1
11
57
5
1
10
67
6
1
9
76
9
3
8
100
10
1
7
107
12
2
6
119
14
2
5
129
15
1
4
133
25
10
3
163
40
15
2
193
84
44
1
237

Inverting the columns 1 and 3 of Table 4 and multiplying the numbers of each row we get the following result (Table 5). The number in the second column may be considered as frequency.

                  Table 5 – Partly inverted form of Table 4       

The figures in the third column clearly indicates that they by and large follow Zipf’s lawThe two distributions are in fact very close, hence they are often referred to as Bradford-Zipf distribution.  The linearity of the Bradford bibliograph indicates a true Zipf situation.

Rank
No. of article/s, i.e. Frequency
Rank  x Frequency
84
1
84
40
2
80
25
3
75
15
4
60
14
5
70
12
6
72
10
7
70
9
8
72
6
9
54
5
10
50
4
11
44
3
12
36
2
14
28
1
20
20









Characteristics of Bibliometric Distribution


  • All bibliographic distributions can be expressed  through algebraic expressions .  
  • On graphical presentation, they form different types of curves.
  • All these distributions have given rise to well-established laws which have found applications in journal selection, ranking of authors, ranking of words for keyword generation, and so on.
  • The classical laws of bibliometrics generally  follow power law distribution.
  • All these laws are basically different versions of a single bibliometric distribution.

Conclusion – The discussion above comprises mathematical formulation and views on Bradford distribution. Obviously, bibliometricians paid more attention to mathematical formulation rather than its use. Eugene Garfield is an exception. He used this law profitably for selecting journals for various editions of Current ContentsScience Citation Index and other Citation Indexes. In early 1960s there was no reliable estimate of scientific periodicals being published in the world. The estimates varied from 50,000 to 100,000. From these he was to choose initially 600+ periodicals. In 1970s, the number went above 2,000. Here he took the help of Bradford distribution and went for core periodicals of a speciality. In the process he discovered that only about 10% of the periodicals are accounting for about  90% of the citations  This finding gave rise to the Garfield Law of Concentration which is defined as follows. The list of journals most cited in any individual speciality is essentially the same for all specialities. A basic concentration of journals is the common core or nucleus of all fields.  For building up a core collection of periodicals for a speciality, Garfield Law of Concentration if of great help.


Multiple Choice Questions

1 / 1 Points

Question 1: Multiple Choice

Bradford’s law is applicable for ________ periodicals
  • Wrong Answer Un-checked Cultural
  • Correct Answer Checked Scientific
  • Wrong Answer Un-checked Educational
0 / 1 Points

Question 2: Multiple Choice

For petroleum literature Cole found the value of b as _________
  • Wrong Answer Checked 4.3
  •  Un-checked 0.43
  • Wrong Answer Un-checked .043
0 / 1 Points

Question 3: Multiple Choice

Who showed that Bradford distribution is quite close to Zipf distribution
  •  Un-checked M G Kendall
  • Wrong Answer Checked Abraham Bookstein
0 / 1 Points

Question 4: Multiple Choice

Who showed that Bradford’s verbal formulation is different from his graphical formulation
  • Wrong Answer Un-checked George Kingsley Zipf
  • Wrong Answer Checked Samuel Clement Bradford
  •  Un-checked B. C. Vickery
1 / 1 Points

Question 5: Multiple Choice

Why is Leimkuhler’s formulation of Bradford’s law disadvantageous for the practical documentalist?
  • Wrong Answer Un-checked It is difficult to understand.
  • Correct Answer Checked It requires tedious statistical computation
2 / 5 PointsFinal Score:

True or False

Question 1: True or False

Bibliometric distributions can generally be expressed through algebraic expressions.
Correct Answer Checked True
 Un-checked False

Question 2: True or False

Many bibliometricians derived the equation for Bradford’s law.
Correct Answer Checked True
 Un-checked False

Question 3: Multiple Choice

The constant β occurs in ________ formulation
  • Correct Answer Checked Leimkuhler’s
  •  Un-checked Brookes’
  •  Un-checked Cole’s

Question 4: True or False

The verbal formulation of Bradford’s law differ from its graphical formulation.
Correct Answer Checked True
 Un-checked False


No comments: