I mentioned in the previous post Pareto's 20/80 rule. Here, I will discuss Pareto's distribution, insisting on how (and in what conditions) it gives rise to this result. I had some trouble understanding the derivation as presented in various sources, so I will go through it in detail.
The functional form of the Pareto distribution is a power law, over an interval \((L,H)\) such that \(0<L<H\leq \infty\). I will use the notations of the Wikipedia page unless stated otherwise. Its probability density function (PDF) \(p(x)\) and cumulative distribution function (CDF) \(F(x)\) are (\(\alpha\) is real and strictly positive):
\[p(x) = \dfrac{\alpha}{1-(L/H)^{\alpha}} \dfrac{1}{x} \left ( \dfrac{L}{x} \right ) ^{\alpha}\quad ; \quad F(x) = \dfrac{1-(L/x)^{\alpha}}{1-(L/H)^{\alpha}}\]
One often uses the complementary CDF (or survival function) defined as:
\[S(x) = 1 - F(x) = \dfrac{1}{1-(L/H)^{\alpha}}\left [ \left ( \dfrac{L}{x} \right )^{\alpha} - \left ( \dfrac{L}{H} \right )^{\alpha}\right ]\]
Note that the survival function is very similar to the PDF multiplied by \(x\): \(S(x) \simeq \dfrac{x}{\alpha} p(x)\), the difference being due only to the final truncation term. However, this is only true for power laws, as one can easily check by writing \(p(x) = F'(x)\) and solving the resulting ODE. We should therefore carefully distinguish \(x p(x)\) (which is, for instance, the integrand to use for computing the mean of the distribution) and \(S(x)\) which "has already been integrated", so to speak.
Let us use this continuous model to describe the distribution of publications (neglecting for now its intrinsically discrete character). \(x\) stands for the number of publications by one author, bounded by \(L\) and \(H\). The number of authors that published \(x\) books is given by \(N_0 \, p(x)\). \(N_0\) is the total number of authors.
- The first question is: who are the first \(f\) more prolific authors (in Pareto's case, \(f = 0.2 = 20\)%)? More precisely, what is the threshold number of publications \(x_f\) separating them from the less prolific ones?
- The second question is: how many publications did these top \(f\) authors contribute?
No comments:
Post a Comment