Is it reasonable to define and direct linguistic policies in digital space without having sufficient, accurate and precise indicators on the situation of languages and their progress?

Quite paradoxically, the world of networks, born and developed in universities, in a way surrendered measuring the language situation to marketing companies, whose intentions are different from those of scientific publication, and who are therefore not very concerned with documenting their methods. 

Disorder and confusion regarding the state of languages on the Internet has been the result, which can lead to disinformation. Therefore, while the proportion of English-language speakers who use the Internet has gone from more than 80% in the year the Web was born to 35% today, the figures circulating in the media, against all evidence, are reported as stable between 70% and 80%!

It is urgent that the academic world regains its role in this area along with public institutions, both national and international. There are clear signs that this change is finally occurring! For an update, consult the proceedings online of the meeting Multilingualism for Cultural Diversity and Participation of All in Cyberspace organized by UNESCO with ACALAN (Academy of African Languages) and AIF (Intergovernmental Agency for Francophone Countries and Regions) held in Bamako.

While waiting for this process to develop accurate documented indicators updated at the speed of the development of new media, gaining a clear perspective on this situation and its trends is extremely difficult.

Data on the proportion of Internet users in each language group

With great regularity, Global Reach has supplied figures which certainly come from multiple sources and that are not consistent in terms of methodology, but at least they are known (Figure 1). Even if we ascribe only relative confidence in them (20% margin of error), they provide a reasonable perspective on the growth of Internet users by language group.

Data related to languages on the Web

There are a certain number of simultaneous approaches; each of them has its own limitations:

  • extrapolating figures from search engines by language;
  • creating a sample of several thousand websites through a random selection of IP addresses (Wikipedia, 2005d), running language recognition engines on this sample of sites, and then generalizing the results;
  • publishing figures without revealing methodology;
  • using search engines to obtain the number of occurrences of a given word in a given sector of cyberspace, such as Web pages or discussion groups.

The last method was applied by FUNREDES and the Latin Union in the Figure 2.

