Processing math: 100%

August 29, 2021

The Pareto distribution and Price's law

As detailed in the previous post, the ratio f of the top authors that publish a fraction v of all publications is independent from the total number of authors N0. Of course, this result is incompatible with Price's law (that for v=0.5, f=1/N0). This issue has been discussed by Price and co-workers [1], but I will take here a slightly different approach.

I had assumed in my derivation that he domain of the distribution was unbound above (H=), and that the exponent α was higher than 1. One can relax these assumptions and check their effect on f by:

  1. imposing a finite upper bound H and
  2. by setting α=1. Note that 2. also requires 1. 

Role of the upper bound

In the finite H case one must use the full expressions (containing H and L) for the various quantities. In this section, we will continue to assume that α>1. Since L acts everywhere as a scale factor for x (and H) I will set it to 1 in the following. It is also reasonable to assume that the least productive authors have one publication (why truncate at a higher value?!) Consequently, all results will also depend on H, but presumably not explicitly on N0, which is a prefactor for the PDF and should cancel out of all expectation calculations. It is, however, quite likely that H itself will depend on N0, since more authors will lead to a higher maximum publication number!

In my opinion, the most reasonable assumption is that there is only one author with H publications, so that N0p(H)=1H(N0α)1α+1, neglecting the normalization prefactor of p(x).

The threshold number xf is easy to obtain directly from S(x):

xf=[f+(1f)Hα]1/α

From its definition, the fraction v is given by: v=αμ11Hα1α1(x1αfH1α). Note that we need here the complete expression for the mean [2]:

μ=αα1L1H1α1Hα

Plugging xf and μ in the definition of v and setting v=1/2 yields:

f=f(1+H1α)αα12αα1Hα1Hα,with f=(12)αα1,

and we assume that the upper bound is given by:

H=(N0α)1α+1.

Exponent α=1

Let us rewrite the PDF, CDF and survival function in this particular case:

p(x)=11H11x2;F(x)=1x11H1;S(x)=1F(x)=x1H11H1

xf=S1(f)=1f+(1f)H1

v=12=1ln(xf)ln(H)xf=Hand, since H=N0,xf=N1/40

Putting it all together yields f=N1/401N1/201 and, in the high N0 limit, fN1/40, so the number of "prolific" authors Np=fN0=N3/40, a result also obtained by Price et al. [1] using the discrete distribution. They also showed that other power laws (from N1/20 to N10) can be obtained, depending on the exact dependence of H on N0.

Fraction f of the most prolific authors that contribute v=1/2 of the total output, as a function of the total number of authors, N0, for various exponents α. The unbound limit f(H), calculated in the previous post is also shown for α>1. With my choice for the relation between N0 and H, this also corresponds to N0. The particular value α=1.16 yields the 20/80 rule, but also the 0.6/50 rule shown as solid black line. Note that the curve for α=1 is computed using a different formula than the others and does not reach a plateau: its asymptotic regime fN1/40 is shown as dotted line.
The graph above summarizes all these results: for α=1, f reaches the asymptotic regime fN1/40 very quickly (N0100). For α>1, f leaves this asymptote and saturates at its unbound limit f(H), calculated in the previous post. This regime change is very slow for α<2: the plateau is reached for N0>106.
In conclusion, an attenuated version of Price's law is indeed obtained for α=1(where it holds for any N0) but also for reasonably low α>1, in particular for α=1.16 (of 20/80 fame) where it applies for any practical number of authors! As soon as α exceeds about 1.5, the decay is shallow and saturates quickly so f is relatively flat.


1 Allison, P. D. et al., Lotka's Law: A Problem in Its Interpretation and Application Social Studies of Science 6, 269-276, (1976).

No comments:

Post a Comment