ROLE OF LANGUAGE ENGINEERING IN SUPPORTING
MULTILINGUAL ASPECTS IN CYBERSPACE

Mr Adeeb Ghonaimy
Director
Egyptian Universities Network
Arab Republic of Egypt

 

1 INTRODUCTION

The network-centered information society is just starting to evolve. Information and knowledge are forming one of the basic ingredients of such society. However, if that information and knowledge does not permeate the texture of the global society and be assimilated by the different societies worldwide, it will have a limited impact on the global community. The convergence of computer and telecommunication technologies is changing the way we perform work, communicate with each other, do business transactions and make use of the different services. A profound impact on education, learning and acquiring knowledge does not need to be emphasized. Information presentation in visual images, sound and natural language either as text or speech is gradually becoming the norm. The Internet is creating a global platform where a worldwide forum is evolving that is becoming vital to economic, social and political success. However, all these developments create problems. Examples of such problems are: access to much of the information may be available only to the computer literate and those who understand English. Also, due to the vast amount of information available, it is becoming hard to identify and select what is relevant that may satisfy a certain degree of credibility. The development of interfaces and components that help users to identify relevant information and then present it to them in the most appropriate manner according to the information content and their cultural and linguistic backgrounds is still lagging behind the rate of information growth. Language engineering is an endeavor in which language technologies are integrated and embedded into language-enabled services and products to support business in a global context and to facilitate interpersonal communication across languages [LINGLINK, 1997].

 

2 THE ROLE OF LANGUAGE ENGINEERING

Language technology will help in designing and implementing the systems needed to effectively deal with information and knowledge in a number of ways. Speech recognition will help in interacting with a number of devices in our own native language. Also, information could be presented by generating speech. Understanding requests and browsing the vast amount of knowledge available is an essential component that can alleviate the problem of information overload and ensures that the relevant knowledge is accessed. It will also be possible to generate and present information in different languages through automated machine translation [Ghonaimy, 1998].

Language engineering will be essential for supporting global business in general and electronic commerce in particular. The success of any business will depend on the quality of information about its customers, its competitors and the market in general. The needed information has to be identified, extracted and presented in natural language either as text or speech. In general, language engineering will deliver the right information at the right time and in the language of the recipient. Automated translation together with document management will improve the quality of service in a global marketplace [LINGLINK, 1997]. In general, success in globalization requires putting emphasis on localization. Some companies interested in international markets organize their effort in a number of activities that could be summarized as follows [Antaki, 1998]:

1. Developers organize a product so that linguistic components can be modified easily (internationalization).

2. Translators make the product available in different target languages.

3. Editors review each version to ensure that culture-specific items are not missed.

4. Marketing division takes care of localization to adapt the product to the local

market.

 

3. THE MULTILINGUAL INTERNET

The Internet is dominated now by the English language. However, due to the near future possibilities of widespread use, it is becoming essential to consider its multilingual nature. Some efforts are being conducted in that respect – e.g. the Babel project which is a joint initiative between Alis Technologies and the Internet Society (Babel, 1998). They are considering the world’s 20 main languages and are trying to study the actual distribution of languages on the Internet. Although the Internet penetration is not high at the moment in some regions, this is expected to change radically in the future. As an example, results of the Internet Domain Survey for Arab Countries is shown in Table 1 for January 1998.

Efforts now are concentrating in developing tools and standards that enable the creation of websites in several languages or in at least one non-Western language. Now an Internet standard (RFC 2070) based on Unicode supports HTML documents in practically every language. In particular, some of the features dealt with are: Markup of bi-directional text, i.e. test where left-to-right and right-to-left scripts are mixed and control of cursive joining behavior in contexts where the default behavior is not appropriate. Also, the HTTP which is the hypertext transfer protocol in use since 1990 is being internationalized. The relevant aspects are character set labeling which ensures correct document interpretation and language negotiation which is used at a site to provide documents in the user’s language of preference.

 

 

Internet Domain Survey Arab Countries, January 1998

Table 1

Domain

Hosts

All Hosts

Duplicate Names

Level 2

Domain

Level 3

Domain

ku

4057

4749

692

9

2925

eg

2013

16930

14917

7

191

ae

1940

1955

15

8

56

lb

1134

1377

243

7

59

om

670

671

1

16

20

ma

431

463

32

5

339

bh

338

339

1

2

4

jo

249

249

0

5

13

qa

189

191

2

5

9

tn

69

69

0

10

68

sa

37

37

0

4

6

dz

16

17

1

1

16

ye

10

10

0

7

6

ly

1

2

1

1

1

sy, sd, iq

0

       

11154

27059

     

 

4. MULTILINGUAL MACHINE TRANSLATION ISSUES

There are many situations in which multilingual translation is needed. Translation from many languages into a single language will be required by large information gathering and processing organizations. Translation from a single language into many languages will be required in the context of foreign trade when operation and other manuals for industrial equipment need to be translated into the language of the countries where the equipment is to be marketed.

Multilingual interlingual machine translation systems translate between a number of languages. In this approach, a universal language independent representation of text known as interlingua is developed. Therefore, the translation process is reduced to two phases: the analysis phase between the source language and the interlingua, and the generation phase from the interlingua to the target language.

For successful machine translation, detailed knowledge of the languages is required at many levels: lexicon syntax, semantics and discourse. It is very difficult to provide such linguistic knowledge for an entire language. However, if we consider only language in a particular domain much of this knowledge could be obtained. The variety of language used in a given science or technology is not only much smaller than the whole language, but is also more clearly systematic in structure and meaning. Therefore linguists and computer scientists co-operate to study the properties of such specialized languages which are called sublanguages or controlled language. Each sublanguage has a distinctive grammar even though it is related to the grammar of the full standard language. Also, the theoretical problem of relating linguistic form to communicative function comes into sharper focus when individual sublanguages are examined [Grishman, 1986].

Sometimes it is essential to distinguish between two translation activities. The first one is called localization (e.g. that used for computer manuals for end users) where it is important to adapt certain parts of the content and perhaps the style of presentation to a certain cultural and linguistic environment. The second one is called diffusion translation where the objective content must be strictly rendered in another language without addition and omission [Boitet, 1998].

A brief outline will now be given for efforts related to Arabic/English translation. AppTek started an English-Arabic translator named TranSphere that uses Lexical Functional Grammar together with a general dictionary having 100,000 words. A number of domain-specific dictionaries have also been developed. This system could be either standalone or part of an integrated system.

Sakhr within Al-Alamiah Group [Sakhr, 1998] is also developing a translation scheme for Arabic-English and English-Arabic. It makes use of the available tools that have been developed before for morphological and syntactical analyzers, electronic dictionaries, semantic support together with a number of other development tools. It is also developing a general platform for dealing with Arabic computations [Ali, 1998].

IBM Egypt is also developing Machine Aided Human Translation schemes. ALIS, Inc. offers a solution that integrates core language-handling technology and translation products [Alis, 1998].

Electronics Research Institute in Egypt, in co-operation with the European Community is developing an English-Arabic and Arabic to English and German medical text translation. This is done in the framework of CAT2 (Computer Assisted Translation) which uses an interlingua approach [Nour, 1998]. Also, some work related to ambiguity in Arabic language processing is being studied [Abed, 1998].

PRINCITRAN: In this system a large-scale lexicon is to be constructed for an interlingual machine translation system for Arabic, English, Korean and Spanish [Dorr, 1995]. For Arabic, the starting point was the use of the Alpnet bilingual Arabic-English online dictionary. Then automatic mappings between English glosses from Alpnet into LDOCE (Longman’s Dictionary of Contemporary English) codes were performed. The codes were then converted into thematic grids which were then exhaustively hand-verified.

 

5. ELECTRONIC COMMERCE AND MULTILINGUALITY

Global market opportunities are now increasing rapidly and is going to use heavily the facilities introduced by the Internet. Thus, the advent of global electronic commerce will add an economic dimension to cultural and linguistic issues. In international trade, companies that adopt a multilingual, multicultural approach are expected to gain a competitive advantage over their monolingual, monocultural competitors.

Regarding business-consumer relations, the following language issues could be stated: The seller of goods or services must be able to publish information in the language and cultural convention of the customers (multilingual electronic publishing). The buyer of goods or services must be able to find, understand and compare information in his own language (multilingual information retrieval). Both buyer and seller must be able to interact naturally and effectively in a common language or across different languages [Urquhart, 1997].

 

6. LANGUAGE PRESERVATION AND DIVERSITY

There is a possibility that from the number of 5,000 to 6,000 languages spoken in the world today, only a few hundred will survive a century later. The pressure on languages can take different forms: economic, social, cultural, etc.. Usually, people directly affected are minorities, but their languages represent the linguistic diversity that has developed over the course of human history. Some linguists argue that language endangerment is serious with great humanistic and scientific consequences. Can new developments in the information age help in preserving some of these languages and thus save language diversity [Woodbury, 1998], [Comrie, 1998].

 

7. REFERENCES

1. Abed, E.M.; Hamada, S.; and Hegazi, N.H. "Ambiguities in Arabic Language Processing". The first Conference on Language Engineering, Cairo, March 1998, pp. 263-271.

2. Ali, N. "New Paradigm for Arabic Computation". The First Conference on Language Engineering, Cairo, Egypt, March 1998, pp. 24-28.

3. Alis, 1998 [http://www.alis.com].

4 Antaki, N.A. "Localization = Multilingual + Multicultural".

[http://www.princeton.edu/~naantaki/WRI353/localization.html].

5. Babel, 1998 [http://babel.alis.com:8080/].

6. Boitet, C. "Human-Aided Machine Translation" in Survey of the State of the Art in Human Language Technology: Chapter 8 (Multilinguality) Edited by Zaenen, A., 1998 [http://www.cse.ogi.edu/CSLU/HLTsurvey/ch8node2.html].

7. Comrie, B. "Language Diversity", 1998 [http://www.lsdac.org/Comrie.html].

8. Dorr, B.J.; Garman, J.; and Weinberg "From Syntactic Encoding to Thematic Roles: Building Lexical Entries for Interlingual MT "Machine Translation, Vol. 9, Nos 3-4, 1994/1995, pp. 221-250.

9. Ghonaimy, M.A.R. "Language Engineering Scope and Basic Concepts" The First Conference on Language Engineering, Cairo, March 1998, pp. 1-22.

10. Grishman, R. and Kittredge, R. (Eds) "Analyzing Language in Restricted Domains: Sublanguage Description and Processing "Lawrence Erlbaum Associated Publisher", 1986.

11. LINGLINK team at Anite system, on behalf of the participants of the Telematics Application Program. Anite systems 151, rue des Muguets L-2167 Luxembourg, 1997.

12. Nour, M. "Comprehensive Criteria for Evaluating the Quality of Machine Translation Systems". The First Conference on Language Engineering, Cairo, Egypt, March 1998, pp. 272-281.

13. Sakhr, 1998 [http://www.Sakhr.com].

14. Urquhart, I. "Language Engineering and Electronic Commerce", 1998.

[http://www2.echo.lu/langeng/en/reps/ecom/ecom.html].

15. Woodburly, A. "Endangered Languages",1998.

[http://www.lsadc.org/Woodbury.html].