Coherent discussion of linguistic diversity on global or regional scales requires a quantitative index of diversity. Unfortunately, quantitative measures of linguistic diversity are rarely employed in current linguistic research, and no established measure is widely used. Existing measures tend to be somewhat simplistic, such as numbers of languages or numbers of language groups.

A satisfactory linguistic diversity index must take into account several factors. Firstly, it must address some unit of analysis, such as a country, a continent or the Internet. Secondly, linguistic diversity should take into account the probabilities of finding speakers of any particular language. It should have a natural minimum of zero, for a completely homogeneous population, and no fixed maximum value.

A greater variety of languages should increase the value of the index, but as the proportion of a language group decreases, its contribution to diversity should also decrease. This way, countries with many language groups of roughly equal size (e.g. Tanzania) will show relatively high linguistic diversity, whereas countries with comparable numbers of languages, but with one or two dominant languages (e.g. the US) will show relatively lower linguistic diversity.
A measure that has these properties is the information-theoretic construct entropy. In statistical terms, entropy is a measure of variance. Entropy is calculated from the estimated proportion of the country population for each language by multiplied it by its natural logarithm and summing all the entries for a given unit (country, region). The final index value is 2 times this sum.

Table 1 presents figures for this entropy-based diversity measure for different regions of the world, based on the 7,639 language population figures presented in the Ethnologue (, and ordered from lowest to greatest linguistic diversity.

