इस ब्लॉग्स को सृजन करने में आप सभी से सादर सुझाव आमंत्रित हैं , कृपया अपने सुझाव और प्रविष्टियाँ प्रेषित करे , इसका संपूर्ण कार्य क्षेत्र विश्व ज्ञान समुदाय हैं , जो सभी प्रतियोगियों के कॅरिअर निर्माण महत्त्वपूर्ण योगदान देगा ,आप अपने सुझाव इस मेल पत्ते पर भेज सकते हैं - chandrashekhar.malav@yahoo.com
05. Classical Law of Bibliometrics
P- 07. Informetrics & Scientometrics *
By :I K Ravichandra Rao,Paper Coordinator
05. Classical Law of Bibliometrics
Home
Content
Content
Objectives
After going through this study material you will
a) know the classical laws of bibliometrics;
b) learn the formulas pertaining to the laws;
c) be able to apply the laws in library environment;
d) educate yourself as to how these laws are verified.
Summary
In library science, there are not many scientific laws that can be expressed in formulas or with equations, and can be verified. This module deals with Bradford’s law of scatter, Lotka’s law, and three laws of Zipf. Bradford’s law deals with the scattering of journal literature in periodicals. After a brief introduction of Samuel Clement Bradford, the definition of the law has been presented in his own words. The lucid explanation of the law is given in detail with examples. The mathematical derivation of the law and the related equation, steps involved in its verification, productivity-wise distribution of periodicals and its explanation are all provided. The Bradford’s bibliograph has been touched upon. The law is used to fathom the spread of the literature in a given subject; identify core periodicals on the subject; acquire the important periodicals on a subject; and delete certain periodicals from the acquisition list of the library.
All the three laws of George Kigsley Zipf deal with words. He studied statistically the occurrence of the words in literature from various angles. Prior to Zipf, the French stenographer, Jean-Baptiste Estoup also made such studies. Zipf carried on the study further. Interestingly, he could discover that the occurrence of words as well as their lengths follow scientific laws. He showed that in a big piece of writing, if one word occurs 2000 times, then it is likely that 2 different words will roughly occur 500 times each, and 3 different words roughly 220 times each and so on. The second law of Zipf shows that the product of the rank and frequency of the words pertaining to a big piece of writing give rise to a constant. A student while studying the occurrence of words in selected poems of Gitanjali by Rabindranath Tagore found that the first ranked word ‘Āmi’ (I) is occurring around 2000 times, and the second ranked word ‘Tumi’(You) around 1000 times, resulting a constant of 2000. The third law deals with the length of the words. He found that the occurrence of single-letter words is less, two-letter words is more, three-letter words is still more. On the other hand the occurrence of 20-letter words or even lengthier words is extremely small. The first law is useful in identifying the style of writing of an author. The second law identifies the highly used words in a writing, which can be used in automatic indexing, keyword selection, and so on. Studies carried out with the third law gives an idea of the maximum length of a word in a language, which proves to be highly useful in specifying the length of a filed in a data base.
Lotka’s law deals with the author productivity and shows that a few authors contribute maximum number of articles and most authors contribute only a single article in a given time. This Law helps in identifying the most prolific authors, moderately productive authors and least productive authors.
Introduction
Classical laws in bibliometrics include Bradford’s law of scatter [1,2,3], three laws of Zipf [28,29,30] and Lotka’s law [16]. These are purely scientific laws They have well-established formulas and the concepts embedded in them are less likely to change with time. Moreover, their validity can be verified scientifically.
Bradford's Law of Scattering
The law was propounded by Samuel Clement Bradford (1878-1948), a British librarian, mathematician and documentalists at the Science Museum in London after a laborious study of scientific literature in mid-1930s. After examining the distribution of scientific literature in periodicals and their coverage in abstracting and indexing periodicals he realized that the distribution of literature follow a particular pattern [ 1,2 ]. He opined that ‘the nucleus of periodicals devoted to the given subject, ... must contain, individually, more articles on that subject than periodicals dealing with related subjects’. [3]. ‘In consequence, it is possible to arrange periodicals in zones of decreasing productivity, in regard to papers on a given subject, and the numbers of periodicals in each zone will increase as their productivity decreases’ [3]. He described a scattering pattern of journals in the area of applied geophysics and lubrication. He plotted the partial sums of references against the natural logarithm of the partial sum of numbers of journals, and he noticed that the resulting graph is a straight line. On the basis of this observation, he suggested the following linear relation to describe a scattering phenomenon[2]
F(x) = a + b log x.
F(x) is the cumulative number of references contained in the first x most productive journal; a and b are constants. The following figure is a hypothetical, but typical, log-linear curve (as described by Bradford) showing aggregates of articles on a given subject corresponding to the number of journals. This type of a curve is usually called a Bradford curve; it is shown in page 3.
P1 in the figure is the point at which the straight line part of the curve begins. Draw Y1P1, Y2P2, and Y3P3 such that they are parallel to the X-axis and OY1 = Y1Y2= Y2Y3. Draw P1X1, P2X2, and P3X3 such that they are parallel to Y-axis. Since P1P3 is a straight line and since Y1Y2 = Y2Y3, X1X2 and X2X3 are equal, say r units. Let the distance between O and X is s units. Thus, if α, β and are the positive real numbers corresponding respectively to the logarithmic abscissa OX1, OX2 and OX3, we have, log α = s, log β = s+r, and log = s+2r. That is,
α = 10s, β = 10r+s = 10s.10r , and = 10s+2r = 10s.102r
Substituting n = 10r, we see that the natural numbers α, β, and γ are related to each other as 1:n:n2. On the basis of this relationship and also since OX1 represents a number of periodicals in a subject area.
Bradford thus states that “If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to subject, they and several groups of zones
In X-axis: Partial sum of Journals (in log scale)
In Y-axis: Partial sum of articles contained in X top most journals (in linear scale)
containing the same number of articles as the nucleus, when the zones will be 1:n:n2 ….”
This is called Law of Scattering or Bradford's law. He has obtained the partial sums of the journals only after the journals are ranked according to the number of articles they publish. Bradford in 1948 summarized his earlier observations in a book which contained a theoretical derivation of his law of scattering .
Interestingly, Bradford in 1948 [3] derived his law in a different approach. In this approach, he assumed that the collection of journals is ranked (or arranged) in decreasing productivity. Productivity of a journal is implicitly defined in terms of the number of articles, in a given subject, it contains. Divide these journals into k zones. Let mk be the number of journals in the kth zone. Let rk be the average number of articles per journal in the kth zone. m1r1, m2r2 ,….. and mkrk are the total productivity of 1st, 2nd, 3rd, … kth zone respectively. Zones are formed such that
m1r1, = m2r2 = m3r3 = ………= mkrk (A)
Since
r1 > r2 > r3 > ….. > rk and m1r1, = m2r2 = m3r3 = ………= mkrk
We have
m1 < m2 < m3 < ………< mk
From (A), we have
mi = i = 2,3,4,……….k.
That is,
Defining
i = 2,3,4, ….. k
We have
Bradford, in his analysis considered only three zones (k = 3). He thus had
m2 = n1m1 and m3 = n1n2m1
He then stated, “ … we know no reason why n1 and n should differ and the simple supposition we could make is that they are equal.” Say, n1 = n2 = n. (Bradford (1948)). Thus, m3 = n2m1. Therefore, he argued that the ratio of the zone sizes will be as 1 : n: n2 which is again the relationship developed in his earlier work.
Since then many have worked in this area and came out with different approaches as well as different explanations. The author of this Unit discusses some of these works in Unit 6. Vickery [27], Bookstein [4], Brookes [5,6,7], Hubert [11,12,13], Leimkulher [15], Egghe [10], Naranan [17] and many others have given different explanations of this Law; Some of these works were also discussed by Ravichandra Rao [21]; he also observed in one of his papers that Baradford's assumption of n1 = n2 = n is unlikely correct.
Explanation
Bradford conducted his study with scientific periodicals only. In certain cases the law holds good for non-scientific periodicals also. The law will be applicable if the articles pertain to a specific subject. If articles of several subjects are taken together for testing the Bradford law, the test is likely to fail. Though the definition speaks of several zones, the formula given in the definition takes care of three zones only. In certain cases the law may be applicable beyond three zones even. Bradford called the first zone as nucleus and other zones as succeeding zones. We may call the first zone as Core, second zone as Allied, and the third zone as Alien, to identify each zone specifically. Let us take a concrete example to understand the law properly.
Suppose on a subject there are 600 articles. If these articles are equally divided into three zones, then each zone will have 200 articles. Obviously, the nucleus will also have 200 articles. If these articles pertain to five journals, the articles of the next group numbering 200, may pertain to 10 journals, 15 journals or more. If the article of the next zone (allied zone) pertain to 15 journals, then it is likely that the article of the 3rd zone (alien zone) will pertain to 45 journals. Now, we find that the articles of the three succeeding zones belong to 5, 15, and 45 journals. If the numbers are divided by 5, we get 1, 3 and 9, that is 1, 3, and 32 which represents Bradford’s law, i.e. 1:n:n2. In Bradford’s study, we generally get an approximate value of n. In reality, the articles in the three zones are only roughly equal.
In a situation where the data ideally suits Bradford formulation, the set of journals, say, numbering 310, would be divided as 10, 10x5, and 10x5x5. The first 10 journals will belong to the core zone, the next 50 to the allied zone, and the last 250 to the alien zone. The number of papers in each zone will be the same. In reality, we do not get such data that perfectly satisfies the law. In most cases, the data satisfies the law roughly.
Bradford’s Bibliograph
The bibliograph can be obtained by plotting R(n) on a linear scale along the Y axis and n on a logarithmic scale along the x axis. It is best drawn on semi-log graph paper. The graph is J-shaped and turns into a straight line from a particular point.
Uses
i) It depicts the spread of the journal literature of the subject. The subjects of the journals harbouring the literature shows how enormous is the spread and the journal variety.
ii) It helps identify the core periodicals in a subject. Also it throws light on the allied and alien periodicals pertaining to the subject.
iii) It produces a ranked list placing the journals according to their productivity in descending order. It helps a great deal in the selection of periodicals.
iv) Due to budget crunch when certain periodicals need to be deleted, usually the lower ranked periodicals are deleted.
ZIPF’S LAWS
Zipf’s laws[28,29,30] are related to the usage of words by individuals. Interestingly, the phenomenon follows a scientific law. Most probably, the French stenographer, Jean-Baptiste Estoup (1858-1950), is the first individual to observe the hyperbolic nature of the frequency of word usage. Estoup recorded his study in his bookGammes Stenographiques published from Paris in 1916. George Kingsley Zipf (1902–1950), the American linguist, saw the 4th edition of Estoup’s book and worked further on it and arrived at his laws in 1935.
First law -- Definition
“If the number of different words occurring once in a given sample is taken as x, the number of different words twice, three times, four times, n times in the same sample , is respectively 1/22, 1/32, 1/42, ...1/n2 of x, up to, though not including, the few most frequently used words; that is , we find an unmistakable progression according to the inverse square, valid for well over 95% of all the different words used in the sample” [13, Preface, p. vi]. Basing this phenomenon , Zipf developed the formula ab2 = k, where a is the number of words occurring b times, and k is a constant.
It is observed that the equation satisfies quite well with the less frequently used words of the sample, which are much more in number compared to more frequently used words in the sample [30, p.47].
Explanation
Suppose in a particular writing, the number of words occurred only once is 1440. In that case the number of words occurring 2, 3 and 4 times is likely to be 1440÷22, 1440÷32, and 1440÷42 , that is 360, 160, and 90. Now, using this data, and the formula we can determine the value of k. We know, 1440 words have occurred only once, that means a = 1440, b = 1. Therefore, k= ab2 = 1440 x 12 = 1440. Now, using the value of k, we can find out how many words have occurred 5 times, 6 times or any other number of times. In this case, the formula will be a = k/b2. Let us try to find out how many words have occurred 6 times. We know k = 1440, and b = 6. Putting these values in the aforesaid equation, we get a = 1440/62 = 40, Around 40 words are likely to occur 6 times each. It is to be noted that the values will change with every piece of writing, literary or non-literary.
Verification
The manual verification of the law is highly laborious and time consuming. The best way to verify the law is to take the help of information technology as detailed under the second law.
Uses
The use of this law in library and information science (LIS) is still negligible, maybe because it is not generally taught in our LIS courses. It will be useful in identifying the style of writing of an author. The only case I know about its application is that of a student of the University of Calcutta who is trying to find out to what extent the writing of a noted Bengali scientist has been influenced by a foreign lady. The details will be known when the thesis is submitted and degree awarded.
Second Law --Definition
Zipf defined the second law as ‘The conspicuousness or intensity of any element of language is inversely proportionate to its frequency. Using X for frequency and Y for conspicuousness (rank) ‘ the law can be mathematically expressed as
Þ . where n is a constant. [30]
Explanation
The words conspicuousness or intensity can be simply termed as rank, andelement can be simply taken for word. The definition can be stated in simpler terms as the rank of any word of a language is inversely proportional to the frequency of its usage. The lower the rank, the higher will be the frequency.
Verification
The verification of the Law came through the use of Miles L Hanley’s Index of Words for James Joyce’s Ulysses. Zipf found that the rank frequency word distribution ‘approximate the simple equation of an equilateral hyperbola: r x f = C’` [14, p.24 ], where r indicates rank and f frequency”. C in the equation is the constant. The Product in Table 3 represents constant. The constant in the case of this Law is not a fixed number, but close to it. A part of the result of Zipf’s experiment in reproduced below in Table 3.
Table 3 - Distribution of Words in James Joyce’s Ulysses
Rank (r)
|
Frequency (f)
|
Product (c)
|
10
|
2653
|
26530
|
20
|
1311
|
26220
|
30
|
926
|
27780
|
40
|
717
|
28680
|
50
|
556
|
27800
|
100
|
285
|
28500
|
A manual verification of the Law is highly laborious and time consuming. In the absence of Hanley’s Index of Words for James Joyce’s Ulysses Zipf’s experiment would have been really difficult. The advent of information technology has given us devices with which we can verify the Law without much difficulty. The steps involved in the verification are given below [24].
- i. Take a piece of writing in English containing not less than 5,000 words. The writing can be an article, a short story, a novelette, a part of a novel, even a technical writing.
- ii. Scan the selected writing with an optical character recognition (OCR) software viz. OmniPage Pro.
- iii. Save the file in a suitable software package such as Microsoft Word for Windows.
- iv. Check the file with the original to ascertain accuracy.
- v. Consider only the textual portion of the writing and remove the names of the authors, author affiliations, abstract, keywords, alpha-numeric expressions like 2nd and F10, alpha-symbolic expressions like au=, and su=, abbreviations such as FDT and ISO, numbers written with digits (eg 324), serial numbering such as a), b) etc., formulas, punctuation marks, intra- and extra-textual references, tables, figures, and appendices.
- vi. The rationale behind the exclusion of the names of authors and author affiliations is obvious because they don’t represent the author’s style of writing and as such cannot be used for word counting. The keywords are sometimes chosen consulting a thesaurus where the author has little choice and sometimes added by the editor. Hence, it was felt safe to exclude them. An abstract is the condensed version of an article and does not necessarily represent the normal style of writing of an author. Moreover, sometimes the abstract is prepared by someone other than the author. Therefore, it was not considered. Alpha-numeric as well as alpha-symbolic expressions, abbreviations, numbers written with digits, serial numbering with a, b, c, etc., and formulas are not words, hence excluded. The references comprise certain fixed elements such as author, year, title of the article, and other bibliographical details which are not the creation of the author. Tables and figures were to be excluded for the ease of sorting. Moreover, at times a table may contain numerous YESes and Noes or Ys and Ns representing the answers of respondents. Appendices usually are also not reflective of the style of writing of an author. Consider only the textual part of the article with the above exceptions since that part seemed to be the best part for judging the word use pattern of an author.
- vii. In the case of hyphenated words, follow the following rules.
- If the hyphen joins a prefix, such as co-ordination, remove the hyphen, and consider the word as coordination
- In other cases such as short-term. Remove the hyphen, and consider the combination as two different words
- Convert the punctuation marks into spaces,
- ix. Convert all spaces into line breaks.
- x. Convert all upper case letters into lower case letters
- xi. Make sure that the file has converted into a pure word file
- xii. Sort the file alphabetically and save it as a text file
- Run the file through a small program that can count the frequencies of word occurrence.. The program will take the text file as its input and reproduce the text file with the corresponding frequencies as its output.
- Convert the file into a table
- xv. Sort the table according to descending frequencies
- Add one column to the table
- Compute rank x frequency
- If the product of rank and frequency is found to be more or less the same, the Law is verified.
Uses
This law is taught in some LIS courses in India. It has been the subject of study in Master’s level dissertations as well as theses. Ray [22,23]in his thesis has applied this law of Zipf basing the words of Gitanjali. The ranking of the words help generating automated abstracts as well as keywords.
Lotka's Law
Bradford’s law deals with the scatter of journal literature devoted to a particular subject wherefrom we get an idea of journal productivity as well. Zipf’s laws are devoted to the study of words from various angles using statistical methods. Lotka’s law studies the author productivity. In 1926, Alfred J. Lotka, a statistician of the Metropolitan Life Insurance Company, USA, became engrossed with the idea of determining, ‘if possible, the part which men of different calibre contribute to the progress of science’. For this purpose, he used the index of Chemical Abstracts for the years 1907-1916 and developed a listing of A and B names [i.e. the names starting with the letter A and B] and the corresponding number of papers each author produced. The same procedure was applied to Auerbach’s Geschichtstafeln der Physik till the year 1900 using complete coverage [16]. After studying the productivity of the authors, he was surprised to see that the productivity of the authors can be expressed with a simple equation.
Equation
The equation derived from the study is xny = c where x stands for the number of contributions, y for the number of authors, and c is constant. Lotka found the value of n as 2. Since then, many have worked in this area. References to these works may be seen in Egghe (9,10), Ravichandra Rao (2o); also Egghe and Ravichandra Rao (8) has developed a model to explain the distribution of productivity based on the fractional counting method. Lotka's law was derived based on the simple counting methos as explained above.
Explanation
It has been observed that more number of authors contribute less number of papers. If you go through the author index of Indian Library Science Abstracts1992-1999 or 2000-2005 [14], you will notice that the largest number of authors have contributed only 1 paper, less number have contributed 2 papers, still less number contributed 3 papers and so on. If you take a count you may find that around 100 authors have contributed 1 paper each, and only about one or two authors have contributed 10 papers or more. From the formula the productivity of authors producing 1, 2, or more articles, can be estimated as below.
Table 5 Distribution of Papers according to the Number of Authors
No. of Papers (x)
|
No. of authors (y)
|
1
|
1024
|
2
|
256
|
3
|
114
|
4
|
64
|
5
|
41
|
6
|
28
|
7
|
21
|
8
|
16
|
9
|
13
|
10
|
10
|
Verification
For the verification of the law, you are to follow the following steps.
i. Take an author index of an abstracting or indexing periodical. You can take the whole A to Z index if the number of entries is manageable. Otherwise you can take part of the index.
ii. Count the number of entries against each author, and write down the number against his name as given in Table 6. If this data is entered into a computer, sorting of the data in column 3 will be very easy.
iii. From the sorted data in column 3, find out the number of authors who have contributed one article, two articles, and so on.
iv. Tabulate the data as in Table 5.
v. Taking the value of n as 2, check whether the data fit into Lotka’s law. If it does not, you are to try with different values of n. It may be more than 2 or less than 2. By trial and error method you may arrive at a value that will bring the figures quite close to the actual values. Otherwise you may follow Sen’s method [24] or Pao’s method [18] to find out the value of n. Pao’s method involves a great deal of mathematical calculation and students with very good mathematical background can apply the same.
Table 6 – Author Productivity*
Authors
|
Entry No/s. of Contributions
|
No. of Contributions
|
Abbahu ( G E P)
|
2599
|
1
|
Abbas (S M )
|
2738
|
1
|
Abbas Ibrahim
|
0538
|
1
|
Abbulu (G E P)
|
0917
|
1
|
Abdella (Woinshet)
|
3036. 3037
|
2
|
Abdul Azeez (T A )
|
1215
|
1
|
Abdul Jaleel (T)
|
1119
|
1
|
Abdul Majeed Baba
|
0595. 0956, 0929
|
3
|
Abdul Rashid
|
0311, 0312, 0743, 1477, 1684
|
5
|
Abdur Rauf Meah (Md.)
|
2511
|
1
|
Abid (Abdelaziz)
|
2453
|
1
|
Abideen P (Sainul)
|
0461
|
1
|
Abidi (Syed A H)
|
2504
|
1
|
Abifarin (Abimbola)
|
2233
|
1
|
Abraham (Deborah V H )
|
0423
|
1
|
Abraham (J)
|
0424, 0466, 0744, 1015, 1726, 2031, 3011
|
7
|
Columns 1 &2 reproduced from Indian Library Science Abstracts 1992-1999 [4]
Use – It helps to determine the highly productive moderately productive, and less productive authors in a subject.
No comments:
Post a Comment