August 28, 2021

The Pareto distribution and the 20/80 rule

I mentioned in the previous post Pareto's 20/80 rule. Here, I will discuss Pareto's distribution, insisting on how (and in what conditions) it gives rise to this result. I had some trouble understanding the derivation as presented in various sources, so I will go through it in detail.

The functional form of the Pareto distribution is a power law, over an interval \((L,H)\) such that \(0<L<H\leq \infty\). I will use the notations of the Wikipedia page unless stated otherwise. Its probability density function (PDF) \(p(x)\) and cumulative distribution function (CDF) \(F(x)\) are (\(\alpha\) is real and strictly positive):

\[p(x) = \dfrac{\alpha}{1-(L/H)^{\alpha}} \dfrac{1}{x} \left ( \dfrac{L}{x} \right ) ^{\alpha}\quad ; \quad F(x) =  \dfrac{1-(L/x)^{\alpha}}{1-(L/H)^{\alpha}}\]

One often uses the complementary CDF (or survival function) defined as:

\[S(x) = 1 - F(x) = \dfrac{1}{1-(L/H)^{\alpha}}\left [ \left ( \dfrac{L}{x} \right )^{\alpha} - \left ( \dfrac{L}{H} \right )^{\alpha}\right ]\]

Note that the survival function is very similar to the PDF multiplied by \(x\): \(S(x) \simeq \dfrac{x}{\alpha} p(x)\), the difference being due only to the final truncation term. However, this is only true for power laws, as one can easily check by writing \(p(x) = F'(x)\) and solving the resulting ODE. We should therefore carefully distinguish \(x p(x)\) (which is, for instance, the integrand to use for computing the mean of the distribution) and \(S(x)\) which "has already been integrated", so to speak.

Let us use this continuous model to describe the distribution of publications (neglecting for now its intrinsically discrete character). \(x\) stands for the number of publications by one author, bounded by \(L\) and \(H\). The number of authors that published \(x\) books is given by \(N_0 \, p(x)\). \(N_0\) is the total number of authors. 

  • The first question is: who are the first \(f\) more prolific authors (in Pareto's case, \(f = 0.2 = 20\)%)? More precisely, what is the threshold number of publications \(x_f\) separating them from the less prolific ones?
This is quite easy: if we go through the list of authors (ordered by increasing \(x\)) when we reach \(x_f\) we will have counted the lower fraction, so \(\int_{L}^{x_f} p(x) \text{d}x = F(x_f) = 1-f\). Thus, the survival function is \(S(x_f) = \int_{x_f}^{H} p(x) \text{d}x = f\) and we can simply invert this dependency to get \(x_f =S^{-1}(f)\).
  •  The second question is: how many publications did these top \(f\) authors contribute?
We need to count the authors again, but with an additional factor of \(x\), since there are \(N_0 \, p(x)\) authors with exactly \(x\) publications, for a total contribution of \(x \, N_0 \, p(x)\). The fraction of publications contributed by the top \(f\) authors \(v\) is then:
\[v = \dfrac{\int_{x_f}^{H} x \, N_0 \, p(x) \text{d}x}{\int_{L}^{H} x \, N_0 \, p(x) \text{d}x} = \dfrac{\int_{x_f}^{H} x \, p(x) \text{d}x}{ \mu}\]
where \(\mu\) is the mean of the distribution and \(N_0 \mu\) is the total number of publications.

In the simple case \(H = \infty\) (which requires \(\alpha > 1\)), one has:
\[p(x) = \dfrac{\alpha}{x} \left ( \dfrac{L}{x} \right )^{\alpha}, \quad \text{with} \quad \mu = \dfrac {\alpha}{\alpha-1} L\]
\[f  = S(x_f) =\left ( \dfrac{L}{x_f} \right )^{\alpha} \Rightarrow x_f = L f^{-1/\alpha}\]
 
Plugging the above into the equation for \(v\) yields:
\[v = \dfrac{1}{\mu} \int_{x_f}^{\infty} x \, p(x) \text{d}x = \dfrac{\alpha}{\mu} \int_{x_f}^{\infty}  \left ( \dfrac{L}{x} \right ) ^{\alpha} \text{d}x = \left ( \dfrac{L}{x_f} \right ) ^{\alpha-1} =f^{\frac{\alpha-1}{\alpha}} \Rightarrow f = v^{\frac{\alpha}{\alpha-1}}\] 
Pareto's rule \(f=0.2\) and \(v=0.8\) requires \(\alpha \simeq 1.161\): a power law with this exponent will obey the rule, irrespective of the values of \(L\) and \(N_0\). Despite the neat coincidence in the established statement of the principle, there is absolutely no need that \(f+v=1\)! For instance, the same \(\alpha\) implies that, for \(v=0.5\), \(f \simeq 0.065\), a result I have already used in the previous post.

No comments:

Post a Comment