MULTILINGUALISM ON THE NET
Mr Toru Nishigaki
Professor
Institute of Social Science
University of Tokyo
Japan
Introduction
What will be the effect of the Internet upon natural languages in the 21st century? It is widely accepted that, generally speaking, any new media change languages. For example, the prevalence of TV has dramatically homogenized spoken accents over the past few decades. Young people tend to speak in almost the same way as TV casters, and local dialects and accents remain only among the speech of older people.
This TV effect, however, is insignificant as compared with the enormous effect of print media on languages over the past few hundred years. The prevailing print texts, especially newspapers etc., enabled millions of people who had never met each other to think about the same topics in the same language, thus creating a sort of community identity. This community identity was transformed into national identity, upon which in turn the nation-state was established, as discussed by political scientist Benedict Anderson [1]. Print languages are widely acknowledged as standard national languages, for which dictionaries are edited, and lessons are given in schools. On the other hand, other languages gradually declined. Since the market economy requires any printing business to have a certain amount of readers, print languages tend to be limited to so-called major languages spoken by millions of people. In short, the number of written languages on the earth decreased after the arrival of print media.
What, then, will the arrival of the Internet bring about? ---- Roughly speaking, we can predict two distinct directions. The first one is English monopoly. The Internet has originated in the United States, and it is obvious that at present most international correspondence takes place in English. This is partly for the historical reason that the Internet has developed as a communication tool for the researchers of science and technology whose common language is English. Now general people in addition to science and technology researchers often utilize the Internet, but English is still the dominant language if one wants to look at foreign Web sites or send E-mail across state borders. Therefore there is the possibility that, sooner or later in the 21st century, English will become the sole common language for international communication, thereby accompanying the inescapable decline of other languages. In this case the term globalization means the hegemony of English-based, United States-centered single culture spreading all over the world.
On the other hand, however, we may expect the second direction where various cultures in different countries thrive and interchange with each other, resulting in fruitful and plural global culture. The globalization of the Internet, if a multilingual environment is realized, is expected to attain this plural culture.
One of the noteworthy features of the Internet is, as opposed to print media, that one need not have many readers when writing texts for public consumption. As long as we have the system to transmit and display inputted texts correctly, we can expect to see diverse texts in various languages freely moving around on the Internet. In Japan, for example, we presently have few print texts in languages other than Japanese or English. However, hereafter we may have the chance to see abundant texts in any language flowing into Japan through the Internet, be they in Arabic, Hindu, or whatever.
Which of these two directions is taken largely depends upon the information processing technology for the Internet. In the last few years there has been noteworthy technological progress towards the second direction. A key issue consists in the development of an international character code system. The well-known Universal multiple-octet Coded character Set (UCS), authorized as ISO/IEC-10646-1 in 1993, offers a large number of characters for various languages. The current UCS is based on Unicode which is 16-bit character code system [2]. The noteworthy fact is that a lot of de fact standard software like Web browsers (Internet Explorer, Netscape) and mailers (Outlook Express, etc.) have recently come to support UCS. Therefore users, once they have downloaded the necessary character fonts, are able to exchange message texts in diverse languages across state borders.
The end of this century is becoming a major turning point in the character of the Internet, as it changes from an English-monolingual to a multilingual environment. It is obvious that this new direction is preferable, because proficiency in English can be expected only from a relatively small part of the world’s population. Nevertheless many problems still face the development of a true multilingual environment.
There are three technological issues involved in the realization of a multilingual environment on the Internet. Firstly, an international character code system as stated above. The conventional character code systems are different from state to state. For example, Japan, China and Korea have independent code system called JIS, GB and KS respectively, and within these systems the same Chinese (Han) characters have different codes. Obviously this causes great problems in international communication. The UCS of ISO/IEC-10646-1 determines universal codes for 38,885 characters in 25 scripts, namely Arabic, Armenian, Bengali, Bopomofo, Cyrillic, Devanagari, Georgian, Greek, Gujarati, Gurmkhi, Han, Hangul, Hebrew, Hiragana, Kannada, Katakana, Latin, Lao, Malayalam, Oriya, Phonetic, Tamil, Telugu, Thai, and Tibetan. There are 20,902 Chinese characters included in the UCS. However, critical comments are often heard in Japan and China because the total number of Chinese characters is said to be much more than 50,000. In addition, such scripts as Ethiopic and Mongolian are still excluded in the current UCS. It would be necessary for the UCS to expand itself towards the development of a code system of 32-bit or more, if this wider range of characters and scripts is to be included. At the same time manufacturers would be required to make their products able to handle the UCS.
The second issue is an input/output system. That is, we need the technology to input various characters using keyboards or touch panels, and conversely, to display them on screens or papers. This is an important point, since oriental characters are generally more complex than occidental ones. The input/output systems of Chinese characters were studied intensively during the 1980s, and we now have highly refined ones in Japan and China. In general the technological level of input/output systems in northern Asia are fairly satisfactory, including Hangul handling systems in Korea. As for southern Asia, on the other hand, we can see much more variance. The levels of information technology in Singapore and Malaysia are advanced, followed by those of Philippines, Thai and Indonesia, but there are other countries whose levels are still unsatisfactory. The inherent complexity in their scripts often hinders the rapid development of text processing technology.
The case of India is especially worth mentioning. Highly developed information industries and excellent engineers are already found in some cities of India, indicating its high potential in this technological field. Despite that, the language situation of India --- with as many as 18 official languages --- is too complex to realize a satisfactory multilingual environment on the Internet [3]. It will be difficult to achieve a simple and easy-to-use multilingual environment on the Internet in southern Asia, where many nations and languages are intermingled.
The third issue is translation-support technology. This is the technology to realize people’s quick understanding and composition of foreign texts by the use of computers. Probably it is considered to be the hardest part of the technologies for multilingual environment on the Internet. The ability to handle diverse characters would be of little use if one cannot understand foreign texts at all. For example the inflow of Arabic texts into Japan could hardly promote international communication unless at least a certain amount of Japanese people can easily grasp their meaning. Therefore one may rightly expect that the technology to translate a foreign text automatically by computer would be of great help.
It is well known, however, that so-called machine translation remains a dream technology with few practical applications. During 1980s a lot of research work was done in the field of artificial intelligence with the aim of realizing machine translation, but without much success. This is mainly because computers find it hard to grasp the ever-changing contexts in which human languages are used [4]. Nevertheless computers can rightly assist the understanding and/or composition of foreign texts by human being. By memorizing vocabularies and grammatical knowledge in computers, and by retrieving them interactively on demand, even a beginner of foreign language study may quickly grasp the general idea of texts and/or carry on simple compositions. What is important in the Internet age is not perfect translation of each sentence but improvement of communication ability interconnecting different languages. Interactive translation-support technology is considered to become indispensable for this purpose.
3. Language/Power Forum – An Experiment of Multilingualism
Global academic efforts are indispensable for the realization of multilingualism on the Internet. We would like to introduce here an experimental online forum termed Language/Power (L/P) which we are carrying on in Institute of Social Science, University of Tokyo.
The L/P forum is an interdisciplinary academic forum of which participants are specialized in a variety of fields such as sociology, politics, economics, law, computer science, religious studies, anthropology, linguistics, literature, etc. Among them are engineers and journalists as well as researchers. The discussion centers on how people with different nationalities and languages can constitute an online community. An example of discussion theme is "online communities and the individual, language and state", which focuses on the relation between the individual and the state in the 21st century. The forum name Language/Power symbolizes that language is always related to questions of social power.
Everybody is welcome to follow the discussions of the L/P forum on the Web page at the following address, although it is basically a closed forum where invited participants are allowed to express their views:
http://lp.iss.u-tokyo.ac.jp/A distinctive feature of this Web page is that contents are displayed in five languages: Japanese, English, Chinese, Korean and Indonesian. We plan to add French, German and others in due course. In order to display various characters side by side on the screen, we utilize the Unicode (UTF-8) which corresponds to the UCS of ISO/IEC-10646-1. Therefore one needs a Web browser which supports the Unicode like Internet Explorer 4.0.
The participants of the forum send messages for discussion to a moderator by E-mail, and the moderator displays the messages on the Web page after translating them into other languages. We make use of translation-support software in this process, but our staff members often modify the outputs of machine translation to improve translation quality.
The L/P forum as such is an experiment in multilingual international communication on the Internet, as well as an interdisciplinary discussion on 21st century multilingualism. It thus offers a sharp contrast to conventional arguments criticizing the monopoly of English on the Internet, but which have no choice but to be put in English for international communication. In the L/P forum, on the other hand, people are able to search for solutions to the problem in a multilingual environment. Concrete technological issues as well as theoretical ones are addressed, to find ways of bringing machine translation software closer to the needs of users in order to achieve effective translation for mulilingual communication.
4. Summary and Conclusion
The language situation on earth will change greatly in the 21st century and will have enormous effects on cultures and societies, as the globalization promoted by the Internet gathers pace. If English continues to be the sole language of international communication, those proficient in English will tend to control a cyberspace from which most people of non-English speaking countries are excluded. There is a possibility that the whole earth will be covered by the culture of English speaking countries, especially that of the United States. Despite that, the recent development of information technology is bringing about an opposite situation, where a variety of languages circulate in cyberspace, thus opening the way for a fruitful world culture of intensive linguistic exchange.
The key technological issues for realizing a multilingual environment on the Internet are the development of an international character code system, character input/output systems and translation-support systems. A 16-bit international character code system UCS has already been authorized by ISO/IEC, but further efforts are required with its expansion to 32-bit code. As for character input/output systems, more efforts are needed in and for developing countries like those in parts of southern Asia. And computer-supported translation technology is now expected to contribute significantly to the multilingual communication on the Internet.
Everybody on earth ought to be able to participate in the Internet society in the 21st century. To achieve this goal, it is important to enable everybody to send messages in his/her own mother tongue. The more different language texts are exchanged on the Internet, the more people get interested in foreign languages. In short, a multilingual environment on the Internet can be expected to have preferable effect on foreign language education, which will result in the promotion of deep cultural interchange.
Print media have partitioned the earth into independent nation-states each with its own national language. On the other hand, new electronic media like the Internet are now interconnecting those different languages with each other, in the hope of creating something which may be called a new global culture.
REFERENCES