August 29, 2021

The Pareto distribution and Price's law

As detailed in the previous post, the ratio \(f\) of the top authors that publish a fraction \(v\) of all publications is independent from the total number of authors \(N_0\). Of course, this result is incompatible with Price's law (that for \(v=0.5\), \(f = 1/\sqrt{N_0}\)). This issue has been discussed by Price and co-workers [1], but I will take here a slightly different approach.

I had assumed in my derivation that he domain of the distribution was unbound above (\(H = \infty\)), and that the exponent \(\alpha\) was higher than 1. One can relax these assumptions and check their effect on \(f\) by:

  1. imposing a finite upper bound \(H\) and
  2. by setting \(\alpha = 1\). Note that 2. also requires 1. 

Role of the upper bound

In the finite \(H\) case one must use the full expressions (containing \(H\) and \(L\)) for the various quantities. In this section, we will continue to assume that \(\alpha > 1\). Since \(L\) acts everywhere as a scale factor for \(x\) (and \(H\)) I will set it to 1 in the following. It is also reasonable to assume that the least productive authors have one publication (why truncate at a higher value?!) Consequently, all results will also depend on \(H\), but presumably not explicitly on \(N_0\), which is a prefactor for the PDF and should cancel out of all expectation calculations. It is, however, quite likely that \(H\) itself will depend on \(N_0\), since more authors will lead to a higher maximum publication number!

In my opinion, the most reasonable assumption is that there is only one author with \(H\) publications, so that \(N_0 p(H) = 1 \Rightarrow H \simeq (N_0 \alpha)^{\frac{1}{\alpha + 1}}\), neglecting the normalization prefactor of \(p(x)\).

The threshold number \(x_f\) is easy to obtain directly from \(S(x)\):

\[x_f = \left [ f + (1-f) H^{-\alpha}\right ]^{-1/\alpha}\]

From its definition, the fraction \(v\) is given by: \(v = \dfrac{\alpha}{\mu} \dfrac{1}{1-H^{-\alpha}} \dfrac{1}{\alpha - 1} \left ( x_f^{1-\alpha} - H^{1-\alpha} \right )\). Note that we need here the complete expression for the mean [2]:

\[\mu = \dfrac{\alpha}{\alpha - 1} L \dfrac{1-H^{1-\alpha}}{1-H^{-\alpha}}\]

Plugging \(x_f\) and \(\mu\) in the definition of \(v\) and setting \(v = 1/2\) yields:

\begin{equation} f = f_{\infty} \dfrac{\left ( 1 + H^{1-\alpha}\right )^{\frac{\alpha}{\alpha - 1}} - 2^{\frac{\alpha}{\alpha - 1}}H^{-\alpha}}{1-H^{-\alpha}}, \quad \text{with } f_{\infty} = \left( \dfrac{1}{2} \right )^{\frac{\alpha}{\alpha - 1}},\end{equation}

and we assume that the upper bound is given by:

\begin{equation} H = (N_0 \alpha)^{\frac{1}{\alpha + 1}}. \end{equation}

Exponent \(\alpha = 1\)

Let us rewrite the PDF, CDF and survival function in this particular case:

\[p(x) = \dfrac{1}{1 - H^{-1}} \dfrac{1}{x^2}; \, F(x) = \dfrac{1- x^{-1}}{1 - H^{-1}} ; \, S(x) = 1 - F(x) = \dfrac{x^{-1}- H^{-1}}{1 - H^{-1}}\]

\[x_f = S^{-1}(f) = \dfrac{1}{f + (1-f) H^{-1}}\]

\[v = \dfrac{1}{2} = 1 - \dfrac{\ln(x_f)}{\ln(H)} \Rightarrow x_f = \sqrt{H} \quad \text{and, since } H = \sqrt{N_0}, \, x_f = N_0^{1/4}\]

Putting it all together yields \(f = \dfrac{N_0^{1/4} - 1}{N_0^{1/2} - 1}\) and, in the high \(N_0\) limit, \(f \sim N_0^{-1/4}\), so the number of "prolific" authors \(N_p = f N_0 = N_0^{3/4}\), a result also obtained by Price et al. [1] using the discrete distribution. They also showed that other power laws (from \(N_0^{1/2}\) to \(N_0^{1}\)) can be obtained, depending on the exact dependence of \(H\) on \(N_0\).

Fraction \(f\) of the most prolific authors that contribute \(v = 1/2\) of the total output, as a function of the total number of authors, \(N_0\), for various exponents \(\alpha\). The unbound limit \(f (H \rightarrow \infty)\), calculated in the previous post is also shown for \(\alpha > 1\). With my choice for the relation between \(N_0\) and \(H\), this also corresponds to \(N_0 \rightarrow \infty\). The particular value \(\alpha = 1.16\) yields the 20/80 rule, but also the 0.6/50 rule shown as solid black line. Note that the curve for \(\alpha = 1\) is computed using a different formula than the others and does not reach a plateau: its asymptotic regime \(f \sim N_0^{-1/4}\) is shown as dotted line.
The graph above summarizes all these results: for \(\alpha = 1\), \(f\) reaches the asymptotic regime \(f \sim N_0^{-1/4}\) very quickly (\(N_0 \simeq 100\)). For \(\alpha > 1\), \(f\) leaves this asymptote and saturates at its unbound limit \(f (H \rightarrow \infty)\), calculated in the previous post. This regime change is very slow for \(\alpha  < 2\): the plateau is reached for \(N_0 > 10^6\).
In conclusion, an attenuated version of Price's law is indeed obtained for \(\alpha = 1\)(where it holds for any \(N_0\)) but also for reasonably low \(\alpha > 1\), in particular for \(\alpha = 1.16\) (of 20/80 fame) where it applies for any practical number of authors! As soon as \(\alpha\) exceeds about 1.5, the decay is shallow and saturates quickly so \(f\) is relatively flat.


1 Allison, P. D. et al., Lotka's Law: A Problem in Its Interpretation and Application Social Studies of Science 6, 269-276, (1976).

No comments:

Post a Comment