इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com
06. Bradford Distributions: An Overview
P- 07. Informetrics & Scientometrics *
By :I K Ravichandra Rao,Paper Coordinator
06. Bradford Distributions: An Overview
Home
Content
Content
Objectives
After going through this case study you will come to know about the following:
a) Derivation of equations for Bradford distribution by various bibliometricians.
b) Viewpoints of some bibliometricians on the law.
c) Ambiguity between verbal and graphical representations of Bradford distribution.
d) Bradford-Zipf distributionww
e) Characteristic s of bibliometric distribution, etc.
Summary
Bradford for his law provided two formulations – one verbal and the other graphical. In 1962, Cole suggested one formulation in the form of an algebraic expression. In 1967 Leimkuhler derived an equation for the law using statistical techniques. The determination of a constant in the equation was found to be tedious. This led Brookes to derive a simple equation. He derived basically two different equations – one for the verbal formulation and the other for the graphical formulation. Bradford’s verbal formulation had some limitations as it allowed only partial graph to be plotted. The entire data presented through the graph gives the complete phenomenon of scatter. The graph from a gentle curve develops gradually into a straight line. In some cases the graph produces a droop at the top end. Naranan maintained that Bradford’s law is explainable in terms of an underlying power law distribution; the law emerges as a natural consequence of exponential growth of scientific literature and journals at comparable rates; and a model like this predicts a strong correlation between the age of a journal and the number of articles it carries. Brookes argued that Naranan’s analysis was not valid for Bradford. Hubert was also of the view that Naranan interpretation of the original form of Bradford’s law does not follow a stochastic argument based on his assumptions. However, his paper provided a plausible model of Lotka’s law with suitable verbal amendments. Analyzing all the classical laws of bibliometrics Bookstein concluded that all these distributions are mainly the different versions of a single theoretic distribution. Ravichandra Rao analyzed the Bradford multiplier with a small sample of 12 datasets using t test, and made an attempt to identify a suitable model to explain the law of scattering. It was observed that log normal fits much better than many models including the log linear model. Vickery in1948 showed that there is disparity in the verbal and graphical formulations of Bradford. He provided the expressions for both. It has also been shown that the two formulations generate two different graphs, one is incomplete and the other complete. In the graph of verbal formulation certain features of the distribution are missed. Kendall showed that the data of Bradford distribution is amenable to be presented in a way that it shows Zipf distribution. Bradford and Zipf distributions are in fact very close. The linearity of the Bradford bibliograph indicates a true Zipf situation. The characteristics of the bibliometric distributions have been highlighted.
Cole's Formulation
Bradford stated that “If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to subject, they and several groups of zones containing the same number of articles as the nucleus, when the zones will be 1:n:n2 ….” Later, Cole [6] also experimented with the law and named the slope of the curve as the reference scattering co-efficient and concluded that the coefficient might be the characteristic of the subject field. For petroleum literature, Cole obtained the relationship
F(x) = 1+ b log10 x (x > c)
where F(x) stands for the cumulative number of papers contained in the x number of most productive journals, and c represents the number of journals figuring in the nuclear zone. For petroleum literature Cole found the value of b as 0.43.
Leimkuhler’s Formulation
Ferdinand F. Leimkuhler [11], Professor of Industrial Engineering of the Purdue University of the United States analysed all the published data on Bradford distribution and derived the following equation applying statistical techniques.
ln(1+bx)
F(x) =-----------------, (0£ x £ 1)
ln (1 + b)
In the equation, F(x) stands for the cumulative fraction of the references, x for the corresponding fraction of the most productive journals, and b is a constant related to the document collection. Brookes [3] opined that the equation is but a compromise since Leimkuhler accepted the empirical data as both complete and exact. Brookes further observed that “Unfortunately, though Leimkuhler’s formulation can be used theoretically without difficulty, it has some disadvantages for the practical documentalist. The numerical evaluation of the key parameter b requires tedious statistical computation and the solving of an implicit equation by approximation methods....In fact it was the exasperation evoked by an attempted practical application of Leimkuhler’s formulae that led the author of this paper to seek a simpler formulation of the Bradford distribution” [3: p249]
Brookes Formulation
Brookes [3] formulation of the Bradford distribution follows. Suppose R(n) is the cumulative total of relevant papers found in the first n journals when all the journals are ranked in order of decreasing productivity, the Bradford’s law requires that
R(n) = R(n2) - R(n) = R(n3) - R(n2) = ....
for all integral values of n greater than unity.
We have R(n) = R(n2) - R(n) Eq. 1
Þ R(n) + R(n) = R(n2)
Þ 2R(n) = R(n2).
Again ,we have R(n2) - R(n) = R(n3) - R(n2) Eq. 2
Þ R(n2) + R(n2) - R(n) = R(n3)
Þ 2 R(n2) - R(n) = R(n3)
Þ 2. 2 R(n) -R(n) = R(n3) [R(n2) = 2R(n) from Eq. 1]
Þ 4R(n) - R(n) =R(n3)
Þ 3R(n) = R (n3)
We have R(n2) =2R(n)
R (n3) = 3R (n)
Hence, we can generalize R(nx) = xR(n), where x is a positive integer
1
Þ R(n) = -- R(nx)
x
Putting 1/x =a, and R(nx) = nb we can write the equation as
Þ R(n) = anb [ 1 £ n £ c] Eq. 3
where c is the value of n at the point where the straight portion of the curve begins [Fig. 1]
The only function that fully satisfies this condition is
R(n) = k log n, where k is a constant Eq.4
Brookes has provided another model
n
R(n). = N log ---- [ c £ n £N] Eq.5
s
where N =Total number of periodicals expected to publish paper on the subject, and
s = Value of n at the intersection of the straight portion of the curve with the X axis [Fig. 1].
Now, let us take Eq. 3 and find out the values of a and b
In Table 1, we find that the first 20 periodicals account for 590 articles, and the first 30 periodicals account for 689 articles. In the first case. utting these values into the Eq. 3 we get
590 = a. 20b Eq. 6
689 = a. 30b Eq. 7
Dividing Eq. 7 by Eq. 6, we get
689/590 = 30b/20b
Taking log we get log 689 - log 590 = b log 30 - b log 20 = b (log 30 - log 20)
Þ 2.8382 - 2.7709 = b (1.4771 - 1.3010)
Þ 0.0673 = b (0.1761)
Þ b = 0.382
Now, putting the value of b in Eq. 6, we get
590 = a. 20.382
Taking log we get log 590 = log a + .382 log 20
Þ 2.7709 = log a + .382x 1.3010
Þ 2.7709 = log a + .4970
Þ log a = 2.7709 - .4970 = 2..2739
Taking antilog of 2.2739 we get 187.9 i.e. 188
The values we get are a = 188, and b = .382
With the values of a and b we can now determine the value of the cumulative total of references for the periodical of any rank.
For testing, let us take 45th rank.
We have R(n) = anb
Putting the values , we get R(n) = 188 x 45.382
= 188 x 4.276
= 803.888
=804, which is quite close to the observed value of 802.
Bradford considered the bibliograph to be a straight line which has resulted in two different formulations, one is verbal and the other is graphical. The algebraic expressions for the two formulations given by Brookes are:
R(n) = j log (n/t + 1) for the verbal formulation, and
R(n) = k log n/s for the graphical formulation .
Naranan’s Viewpoint
In 1970 Naranan [12] opined that (i) Bradford’s law of bibliography of scientific literature is explainable in terms of an underlying power law distribution of the number of articles in scientific journals; (ii) the law emerges as a natural consequence of exponential growth of scientific literature and journals at comparable rates; and (iii) a model like this predicts a strong correlation between the age of a journal and the number of articles it carries. The author was hopeful that the proposed mechanism might find wider application in many other fields of science.
Brookes [5] pointed out that Naranan’s analysis was not valid for Bradford. However, his paper provided a plausible model of Lotka’s law with suitable verbal amendments. The comments of Brookes as to the paper is being reproduced verbatim. “The inverse square law of scientific authorship has hitherto been regarded as an inexplicable and useless scientific oddity. Naranan’s model of it is therefore welcome. And, together with other measures of scientific productivity, Lotka’s law has recently been applied by Dobrov and Korennoi in determining the optimum size of research institutes in USSR”.
Hubert [ 9] was of the view that Naranan interpretation of the original form of Bradford’s law does not follow a stochastic argument based on his assumptions.
Bookstein’s Viewpoint
In his paper published in 1976, Bookstein [1] analysed the distributions of Lotka, Zipf, Bradford and Leimkuhler and adopted a point of view that allows us to understand that these distributions are in fact the different versions of a single theoretic distribution. He generalized these distributions with the following words. “All of these distributions are almost equivalent . . . In each case we have a set of entities (for example, chemists, words) producing events (publications, occurrences) over some dimension of extension (time, length of text) and in each case the distribution describes the number of occurrences of events over a fixed interval of that dimension. Under these conditions it is possible to describe the same distribution in at least four distinct ways; these modes of description are represented above by the distributions of Lotka, Zipf, Bradford, and Leimkuhler”.
Physicists all over the world have tried to unify four natural forces, i.e. electromagnetic force, gravitational force, weak nuclear force, and strong nuclear force for the last hundred years or so. Bookstein has done the same thing for bibliometric distributions. He has shown that basically all the four bibliometric distributions are different versions of the same distribution.
Bradford Multiplier
The number of periodicals in the three zones of Bradford distribution generally follows the ratio 1:n:n2, where n is the Bradford multiplier. Ravichandra Rao [15] analyzed the Bradford multiplier with a small sample of 12 datasets using t test. An attempt has also been made to identify a suitable model to explain the law of scattering. Among the various methods tried log normal fits much better than many models including the log linear model.
Ambiguity between Verbal and Graphical Staements
Bradford's law can be looked into two different ways -- graphically and verbally. This was first observed by Vickery [16]. Bradford’s verbal formulation of the law is recorded as “If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the number of periodicals in the nucleus and succeeding zones will be as 1: n: n2”.[2: p. 154]. In 1948, Brian C. Vickery [16] contributed an important paper on Bradford’s law. He analysed about 1600 journal references and compared his results with Bradford’s and found an inconsistency. He remarked –“We can … regard the theoretical distribution of papers on a given subject in scientific periodicals as derived by Bradford, as fully corroborated by the distributions observed in the sample investigations. The rectilinear relation . . . incorrectly assumed by Bradford to be identical with his theoretically derived relation, fits only the upper portion of the observed curve (Figs. 2 and 3). The theoretical relation itself , however, enables us to predict the whole curve”. Vickery showed that if nm journals contribute a cumulative mpapers, and nm is greater than the nucleus, then the verbal formulation is equivalent to the expression nm : n2m - nm : n3m - n2m: . . . :: 1: am: am2: . . . The graphical formulation is equivalent to the expression nm : n2m : n3m : . . . :: 1: bm: bm2: . . . This apart, when the graph is plotted with the data of verbal formulation it takes a different shape compared to the shape of the graph with complete data set. In the verbal expression, the data in Table 1 will take the following shape and generate a curve as given in Fig. 2.
Table 2 – Distribution of Articles according to Zones
Zone
|
No. of Periodicals
|
Cumulative number of Articles
|
1st Zone
|
10
|
445
|
2nd Zone
|
49
|
886
|
3rd Zone
|
267
|
1332
|
.
Fig. 2 Bradford’s bibliograph with verbal formulation
Y axis – Cumulative number of periodicals; X axis – Cumulative number articles
When the graph is drawn with entire data set we find finer details which are lost in the graph of verbal formulation. Now let us see the graph with the entire data of Table 1
Y axis – Cumulative number of periodicals; X axis – Cumulative number articles
Fig 3 Bradford’s bibliograph with complete data
Comparing the two bibliographs we find the following:
i) In the verbal formulation, the entire data is not available. What we get is practically a summary of the entire data set.
ii) The bibliograph in Fig 2 is incomplete, inasmuch as it does not indicate the starting point of the curve.
iii) The last portion of the graph in both Figs. 2 and 3 is a straight line.
iv) In the graphical presentation of some Bradford distributions, a droop is observed at the end of the graph, which is not seen with the data of verbal formulation.
With these, the distinction between verbal formulation and the graphical presentation becomes quite clear and the shortcomings of the verbal formulation apparent. Brookes have provided equations both for verbal formulation as well as the graphical formulation. The equations are given under Brookes’ formulation.
Bradford-Zipf Distribution
Kendall [10], a statistician by profession, also studied Bradford distribution using 1,763 references on operational research pertaining to 370 journals. For the sake of comparison ‘1465 references to statistical methodology (covering the period 1925-39)’ were used. The graph plotted following Bradford’s method produced a curve which was remarkable for its linearity. He also noticed that the Law is similar to, but not identical with the Zipf’s law. Let us consider the data given in Table 4. The data set provides the typical Bradford distribution.
Table 4– A data set following Bradford distribution.
Rank
|
No. of Periodical/s
|
No. of article/s
|
Cumulative total
|
1
|
1
|
20
|
20
|
2
|
1
|
14
|
34
|
3
|
1
|
12
|
46
|
4
|
1
|
11
|
57
|
5
|
1
|
10
|
67
|
6
|
1
|
9
|
76
|
9
|
3
|
8
|
100
|
10
|
1
|
7
|
107
|
12
|
2
|
6
|
119
|
14
|
2
|
5
|
129
|
15
|
1
|
4
|
133
|
25
|
10
|
3
|
163
|
40
|
15
|
2
|
193
|
84
|
44
|
1
|
237
|
Inverting the columns 1 and 3 of Table 4 and multiplying the numbers of each row we get the following result (Table 5). The number in the second column may be considered as frequency.
Table 5 – Partly inverted form of Table 4
The figures in the third column clearly indicates that they by and large follow Zipf’s law. The two distributions are in fact very close, hence they are often referred to as Bradford-Zipf distribution. The linearity of the Bradford bibliograph indicates a true Zipf situation.
Rank
|
No. of article/s, i.e. Frequency
|
Rank x Frequency
|
84
|
1
|
84
|
40
|
2
|
80
|
25
|
3
|
75
|
15
|
4
|
60
|
14
|
5
|
70
|
12
|
6
|
72
|
10
|
7
|
70
|
9
|
8
|
72
|
6
|
9
|
54
|
5
|
10
|
50
|
4
|
11
|
44
|
3
|
12
|
36
|
2
|
14
|
28
|
1
|
20
|
20
|
Characteristics of Bibliometric Distribution
- All bibliographic distributions can be expressed through algebraic expressions .
- On graphical presentation, they form different types of curves.
- All these distributions have given rise to well-established laws which have found applications in journal selection, ranking of authors, ranking of words for keyword generation, and so on.
- The classical laws of bibliometrics generally follow power law distribution.
- All these laws are basically different versions of a single bibliometric distribution.
Conclusion – The discussion above comprises mathematical formulation and views on Bradford distribution. Obviously, bibliometricians paid more attention to mathematical formulation rather than its use. Eugene Garfield is an exception. He used this law profitably for selecting journals for various editions of Current Contents, Science Citation Index and other Citation Indexes. In early 1960s there was no reliable estimate of scientific periodicals being published in the world. The estimates varied from 50,000 to 100,000. From these he was to choose initially 600+ periodicals. In 1970s, the number went above 2,000. Here he took the help of Bradford distribution and went for core periodicals of a speciality. In the process he discovered that only about 10% of the periodicals are accounting for about 90% of the citations This finding gave rise to the Garfield Law of Concentration which is defined as follows. The list of journals most cited in any individual speciality is essentially the same for all specialities. A basic concentration of journals is the common core or nucleus of all fields. For building up a core collection of periodicals for a speciality, Garfield Law of Concentration if of great help.
Multiple Choice Questions
1 / 1 Points
Question 1: Multiple Choice
Bradford’s law is applicable for ________ periodicals
- Cultural
- Scientific
- Educational
0 / 1 Points
Question 2: Multiple Choice
For petroleum literature Cole found the value of b as _________
- 4.3
- 0.43
- .043
0 / 1 Points
Question 3: Multiple Choice
Who showed that Bradford distribution is quite close to Zipf distribution
- M G Kendall
- Abraham Bookstein
0 / 1 Points
Question 4: Multiple Choice
Who showed that Bradford’s verbal formulation is different from his graphical formulation
- George Kingsley Zipf
- Samuel Clement Bradford
- B. C. Vickery
1 / 1 Points
Question 5: Multiple Choice
Why is Leimkuhler’s formulation of Bradford’s law disadvantageous for the practical documentalist?
- It is difficult to understand.
- It requires tedious statistical computation
2 / 5 PointsFinal Score:
True or False
Question 1: True or False
Bibliometric distributions can generally be expressed through algebraic expressions.
True
False
Question 2: True or False
Many bibliometricians derived the equation for Bradford’s law.
True
False
Question 3: Multiple Choice
The constant β occurs in ________ formulation
- Leimkuhler’s
- Brookes’
- Cole’s
Question 4: True or False
The verbal formulation of Bradford’s law differ from its graphical formulation.
True
False
No comments:
Post a Comment