This is an online version of the December 2001 Technology department Bertram Bruce edits for the International Reading Association's Journal of Adolescent & Adult Literacy. This department is "reprinted" regularly in Reading Online, and ROL readers are invited to browse the full listing of available columns.

The authors and editor welcome comments on this column, which can be posted to our Online Communities.



Internationalized Domain Names


Marc D. Wielansky




Editor's Message | Issue | Web Page of the Month | References




Editor's Message

I teach courses on topics such as literacy in the information age and learning technologies. One of the exciting aspects for me is that I learn so much from the students. One works with nonprofit groups to show them how they can deliver radio programs through the Internet and in turn has shared that expertise with his fellow students and me. Another manages electronic bulletin board systems and was able to lead a class discussion on how they work and the relative advantages of each. Others have extensive experiences with music on the Internet, the various digital divides, e-commerce, art on the Web, online fan clubs, free speech issues, and other topics currently reshaping literacy practices.

This month, one of those students, Marc Wielansky, reports on his own investigation of what may appear at first to be an arcane topic--the internationalization of domain names on the Internet. To many people who are only beginning to appreciate the potential of the Internet, domain names are just one of many strange concepts that fall in the realm of difficult and seemingly irrelevant technical discussions.

But domain names are the medium through which people access any resource on the Web. They are used to identify Web sites uniquely so that a user can easily explore Web pages around the world. Without them, we would need to use a long sequence of numbers to specify a Web site. The domain naming system has become a significant factor in emerging literacy practices. For example, the use of alphabetic characters means that the International Reading Association can be found on the Web through its domain name, reading.org, instead of by specifying the sequence of numbers in its IP address, 24.104.0.35. This system makes the Web more accessible and easier to use, but only if you use Latin characters. Most people in the world use other writing scripts, so even the first step of Web access, getting to a Web page, requires them to use a writing script other than that of their own language.

Expanding the domain naming system to accommodate additional scripts would significantly improve access for those literate in Chinese, Arabic, and most other languages. Even most European languages use variants of the basic Latin alphabet, such as the German umlaut, the Spanish tilde, or the French cedilla. Yet, expanding in this way poses challenges to the inherent open structure of the Internet, to its ease of use for those accustomed to Latin-alphabet-only domain names, and to corporate interests. These are just a few of the issues Marc addresses this month.

Bertram C. Bruce


Back to menu



Issue:
Multilingual Access to the Internet

As with any major technological advance, the mass adoption of Internet technologies does not come without its own problems. Educational institutions have begun to take notice of social issues including concepts of the digital divide and information literacy. Arguments have been made that the Internet intensifies rather than decreases the differences between the haves and the have-nots of modern society. As a result, many philanthropic organizations have devoted themselves to addressing these newfound problems.

Although many of these problems are unavoidable, some are not. Many popular sources acknowledge the Internet as a global phenomenon, referring to the great number of users now and in the near future. As of year-end 2000 there were 374.9 million Internet users, and by 2002 this number is estimated to become 490 million (Pastore, 2001). However, most of the world still does not use the Internet. Although access to the Internet infrastructure may exist for many of these people, there is one major limitation to access: their language.

While Web pages and other Internet media can be created in almost any existing language, their access is limited to English-based ASCII characters. This is due to the very technology that was designed to make the Internet more accessible-the domain naming system (DNS). Because phrases are generally easier to remember than series of numbers, the DNS was created to associate registered domain names with IP addresses. However, as helpful as this system was designed to be, it is limited because its original designers used the ASCII character set, restricting the DNS to only English characters. The original design was never intended for users of any other character set.

The international nature of the modern Internet calls for the development of a new DNS standard. There are multiple reasons for the development of a DNS that supports international domain names (IDNs). First of all, it allows the Internet to be the truly global phenomenon that so many people describe it to be. According to one prediction by Computer Economics, by 2005 there will be approximately 198 million nonnative English-speaking users on the Internet, compared with the 148 million native English-speaking users (Pastore, 2001). In other words, there will be 14% more non-English-speaking users than English-speaking users. As the number of international users continues to grow, the use of ASCII characters alone for identifying Web sites will be unreasonable. Restricting addresses to English characters creates an entry barrier for those who may be unfamiliar with the English character set, even for sites created entirely in other languages. In addition, it limits the site itself by forcing it to have registered a domain name based on English characters.

Providing increased access for those who speak languages with non-Roman characters allows users from multiple-language backgrounds to tap into the power of the Internet, instantly increasing the level of technological literacy for millions of people worldwide. While it is important to note that it is impossible to recognize all character sets based on their sheer number, the majority of the world's character sets should be represented. The creation of IDNs will radically change the way these people use the Internet by allowing them access to the Web, e-mail addresses, FTP sites, and any other network applications based on network addresses in the same manner as many users of the ASCII character set.

In addition, the internationalization of domain names adds to the localization of sites. It gives people a choice in the language of their domain name so that they can address a certain audience. Creating a domain name in a specific language can reflect cultural elements onto a Web site. This helps individuals and businesses connect with their target markets and audiences. It also preserves cultural and linguistic integrity by allowing online identities, such as names and brands of products and organizations, to be represented in natural language and native script. Although the country code top-level domains also provide some localization, they are limited to national borders. International domain names, on the other hand, can appeal to people across the globe.

The current domain naming system

In order to understand the challenges involved in rethinking the DNS, it is necessary to understand how the current system works. Only then can the several options for modifying or upgrading the system be analyzed according to their technical plausibility and effectiveness.

The DNS is one of the most crucial elements of the Internet. It provides an efficient method of matching IP addresses to domain names in a manner that is transparent to most users. It is based on a relatively simple concept: Each domain has a responsibility to provide its own DNS servers for each of its sites. Therefore, when a user wishes to access a site, his or her Web browser queries a directory based on the top-level domain name (such as .com or .net). This directory lists the DNS for an IP address that matches the URL the user requested on that particular domain. For example, when a user wants to access the site located at http://www.lis.uiuc.edu, the user's browser would go through several steps before loading the Web page. First, it recognizes uiuc.edu as the domain name. Next it queries that domain to find the address of the associated DNS server. After that it queries that DNS server (in this case DNS1.CSO.UIUC.EDU) for the correct IP address. Only then does the browser connect to its final destination, located at the IP address 128.174.4.10, in order to download the Web page. It is important to note that on many occasions ISPs will develop DNS tables of their own in order to speed up this process for commonly accessed sites, but will continue to use this process for uncommon or unvisited sites (Topping, 2000).

Once the need for domain names in international characters was recognized, several Internet organizations began working on the technical problems it poses. (I'll discuss this later under Solving the technical problems.)

Implementing a new system

Most of the aftereffects of implementing IDNs will result from the lack of a regulatory body for the Internet. However, the complexity of the modern Internet has made it virtually impossible to regulate. The nature of the Internet is based on the protocols involved being completely open source and on there being no central body responsible for maintaining its structure. Each of the hundreds of organizations worldwide is responsible for providing its part of the connectivity among the thousands of networks that make up the Internet. The openness of its design has allowed the Internet to take on an organic growth pattern as organizations have added technologies that have become standards, but the process has always been slow.

While some official standards-making bodies exist, each lacks sufficient power to enforce the new standards as they are developed. As a result of the lack of a single responsible body for worldwide standards enforcement, it is extremely difficult, if not impossible, to impose a new worldwide standard or to change an existing one. In the past, most new Internet standards have followed a policy of backwards compatibility so that adaptation was not necessary. Otherwise, a new standard would break the Internet for anyone who chose not to follow it.

Currently the Internet Corporation for Assigned Names and Numbers (ICANN) assumes most of the responsibility for "IP address space allocation, protocol parameter assignment, domain name system management, and root server system management functions" (Internet Corporation for Assigned Names and Numbers, 2001). Many other corporations have taken liberties as well. The role of i-DNS.net in creating IDN technology is a perfect example of corporate standards making. They have provided an active solution for IDN resolution since December 1999. Many governments also take liberties to make standards for the Internet. For example, the Chinese government has set up the China Internet Network Information Center and administrative divisions within the government to regulate standards in China. The Chinese government has even gone as far as attempting to regulate the use of the Chinese language on the Internet, believing it to be under its jurisdiction (Wang, 2001).

Domain disputes

The most immediate problem that will result from a finalization of the IDN standard will be the resolution of the names registered by the many organizations that have already begun to register IDNs under the incomplete standards in existence today. Although there seems to be some organization to this process, in most cases it is merely a false impression. Because there are no standards yet, it is impossible for any organization to guarantee rights to any IDN. For example, theVeriSign Global Registry Services is currently "the exclusive provider of registry services for the .com, .net, and .org top-level domains (TLD)" for over 70 domain name registrars (VeriSign Global Registry Services, 2001a). So, one could reasonably assume that registering an IDN with VeriSign or one of its affiliates would be secure. However, this is not the case. As with any other organization involved in resolving the IDN issue, VeriSign is still in the testing phase of development. Therefore, it is impossible for them to guarantee a registry of any domain name because there is no accepted standard.

In recent years when domain registration disputes have arisen, the rights and responsibilities of registering a domain name have evolved differently in the courts of various countries. As mentioned earlier, a domain name is essentially just a phrase that serves as a reference to a specific address in the Internet. However, as the Internet has developed into a commercial entity, that name has come to signify much more. Domain names have reflected trademarks, brand names, corporate names, and marketing slogans.

Trademark rights have been a longstanding issue for domain names on the Internet, but many countries do not even recognize trademark or branding rights for domain names. Of the more than 240 countries heavily involved in the Internet, only 14 have adopted the Uniform Domain Dispute Resolution Policy (Lemanski-Valente & Majika, 2000). This policy and others like it allow trademark owners the right to have domains transferred if they can prove the following:

1. Their domain name is identical or confusingly similar to a trademark or service mark in which the complainant has rights.

2. They have no rights or legitimate interests in respect of the domain name.

3.

Their domain name has been registered and is being used in bad faith. (Internet Corporation for Assigned Names and Numbers, 2000)


As the number and complexity of domain names increase, the likelihood of a domain dispute increases as well. This could occur as a coincidence, but would more likely occur as cybersquatting or cybercloning. Cybersquatting occurs when a domain is registered by a third party in bad faith. Cybercloning involves registering a domain name that is the same or similar to an existing brand name, effectively taking advantage of that brand image. The dot-com eBay encountered this problem when it attempted to expand internationally. Although it preferred the country-specific name ebay.fr, it was forced to accept ebayfrance.com because the former name was already registered. In an attempt to prevent these occurrences and avoid disputes, many organizations register many domain names for themselves based on their various brands and identities.

The introduction of multiple character sets to this situation would only increase the number of domain disputes and make resolution even more difficult. The new domain names present lucrative opportunities for those willing to commit cybersquatting or cybercloning. It also would be even more difficult to detect cases because of the character recognition barriers. Therefore, any organization that is global, or ever plans to be, would have to register its name(s) in every language available. This could put a great time and cost burden on those organizations that have already spent tremendous time and effort developing an online presence. For example, The Coca-Cola Company has built a strong brand image worldwide involving multiple languages. Therefore it would need to register its brand name(s) in all of these languages. However, there is no guarantee that these rights would be granted.

Future implications

The introduction of domain name usage in multiple character sets is a necessary step in the true globalization of the Internet. However, the development and implementation of this phase proves to be a rather difficult step in its evolution. Initial exploration of the idea might suggest that completing this task could be easy, although this is certainly not the case. Short-term and long-term solutions may be implemented, but the selection of any of these technologies will have many implications that go far beyond the implementation of a technical standard. Over the years, the Internet has developed from a technology into a social construct in which any major change raises serious political, legal, and social issues. In the history of the Internet, there has never been a case in which the fundamental technology was changed. In the past, technologies designed to enhance the Internet have been allowed to develop independently and yet have managed to work together in the end. The decentralized structure of the Internet has served as its strength. As the Internet develops in the future, its decentralization may also prove to be a major weakness. Important lessons can be learned from exploring the implementation of an internationalized DNS. This case is one of the first ever to address a fundamental change to the Internet and will not be the last. It is important to learn how to correctly implement similar fundamental changes, because the Internet will continue to require modifications as it evolves as a tool.

Solving the technical problems

The internationalized domain name working group of the Internet Engineering Task Force (IETF) has assumed the responsibility for evaluating the standards for IDNs that are developed by the multiple organizations involved (Internet Engineering Task Force, 2000). The first step to internationalizing domain names is to develop the new technology necessary for handling non-ASCII characters. There are three major views on possible solutions to the IDN problem:

1. Upgrade the DNS and every application that uses it in order to support international characters.

2. Modify the current domain naming conventions to handle international characters without affecting the current system.

3.

Develop a new system that would be a hybrid of the two that would determine if the domain name that is sent to a DNS server is ASCII based or international. Then it would forward the address accordingly to a traditional or internationalized server.

Before selecting any of these options, however, it is important to recognize several requirements created by the IETF that must be met in order for a solution to be successful. First, solutions must be globally compatible. A person using language A must be able to access a computer in language B. Second, it must work for all applications, not just the Web. Finally, it must work through standards to preserve the globally unique naming system currently in place (Wenzel & Seng, 2001).

At first glance, one solution would be to upgrade the DNS to support Unicode rather than ASCII. Unicode is a widely accepted character set standard that incorporates most widely used modern languages, including Latin-based and Eastern Asian languages. Because it works essentially the same way that ASCII does, it could work with the DNS in a similar manner. However, there are two major problems with doing this. First, Unicode uses a much larger character set and therefore uses more bits to communicate each character. As a result, domain names will be limited to fewer characters than the 63 allowed with the current domain name size limits. Second, and more important, changing the character coding of domain names would require every system connected to the Internet to recognize this change. Because there is no regulatory body to enforce this, the capital requirements necessary to convert every system would prevent it from happening on a large scale. Consequently, this solution would break the current system preventing anyone who did not switch to Unicode from using the DNS. This solution may be possible in the long run, but it is ineffective for the immediate future.

A second solution would be to label the character sets used when submitting a query to the DNS. This method allows systems to continue using any character sets, so it involves little adaptation on the user's end. However, it would require all resolvers to be able to recognize all character sets. According to the working group, this is unacceptable because "the number of [character sets] would probably have to be limited and never expand" and "mapping of characters between charsets would have to be exact and not change over time." This could restrict certain users from accessing sites and would also add to the DNS's complexity. Although this idea was discouraged by the IETF, it's another possible long-run solution (Hoffman, 2000).

The solution, therefore, must remain within the boundaries of the ASCII character set. As a result, the only possible solution for IDN resolution would be to somehow map the international characters onto an ASCII character set. This is done through a process termed ASCII-compatible encoding or ACE. There were many proposed methods for the format of ACE, but row-based ACE seems to be the most widely supported. RACE involves a "two-step algorithm that first compresses the name part, then converts the compressed string into an ACE." Although ACE is compatible with the existing ASCII-based DNS, there still are some drawbacks. First of all, the 63-character limit for a domain name is reduced to less than 20 characters in some languages. Because the domain names are encoded, there is also a potential for future problems with managing domain names. Despite these drawbacks, this method seems to be the best short-term solution (Hoffman, 2000).

Two companies are currently using the RACE method of encoding to provide IDN resolution. VeriSign Global Registry Services (2001b) has developed a testbed to test the resolution of IDNs on the .com, .net, and .org top-level domains. It registers international characters in its RACE encoded form with the prefix bq— added to signal that the domain name is international. For example,

japanese characters.com

will be registered as bq—3b7vcv67.com. VeriSign requires that the IDN be converted to ASCII before it is queried on its DNS. On the other hand, i-DNS.net International (2001) has a fully functional solution in which domain names can be registered in their original languages complete with the TLD in their original language. i-DNS is even capable of recognizing character direction for languages that are written right to left, such as Hebrew and Arabic. i-DNS servers can do RACE conversion on the DNS side; however, the user must access an i-DNS server for this service. Therefore, the Hebrew address

hebrew characters

could be sent to an i-DNS, and the Web browser would resolve it in the same way that it would resolve www.info.gov.il. Both companies have been successful so far in the process and plan to continue testing their systems and solutions. According to their Web sites, both are working for the IETF and will support whichever standard is approved.

It is possible that neither company's plans will ever be put in place. Although both have provided solutions that correctly encode characters into ASCII, there is much more involved in IDN character conversion than these two companies represent. Case sensitivity is a significant issue when dealing with any form of text. The current domain naming system is case insensitive to ASCII characters. This has been a great advantage for those who use the Internet. However, this feature only works because case conversion for ASCII characters is simple; everything has a one-to-one match. However, not all languages are this simple. Each character set has special properties that must be considered in developing a DNS that supports international characters. For example, some Asian languages require spelling changes as the case changes, and many Middle Eastern languages have optional characters such as vowels. It is important for any international DNS to be able to resolve the domain name regardless of these issues. In addition, many character sets have common characters that may be represented differently after conversion to ASCII. Chinese, Japanese, and Korean all have similar characters, as do Latin, Cyrillic, and Greek. For these reasons, RACE might not be a permanent solution; one of the other two solutions mentioned may be more useful in the long run.


Back to menu



Web Page of the Month

Consult the Internet Engineering Task Force internationalized domain names working group's Web site for continuing coverage of the development and implementation of international domain names.


References

Hoffman, P. (2000, July 11). Comparison of internationalized domain name proposals. Retrieved May 1, 2001, from http://www.i-d-n.net/draft/draft-ietf-idn-compare-01.txt
Back

i-DNS.net International. (2001). Samples. Retrieved May 1, 2001, from http://www.idns.net/tech/samples/index.html
Back

Internet Corporation for Assigned Names and Numbers. (2000). ICANN uniform domain name dispute resolution policy. Retrieved May 1, 2001 from http://www.icann.org/udrp/udrp-policy-24oct99.htm
Back

Internet Corporation for Assigned Names and Numbers. (2001). About ICANN. Retrieved May 1, 2001, from http://www.icann.org/general/abouticann.htm
Back

Internet Engineering Task Force. (2000, June 20). Internationalized domain name (idn) charter. Retrieved May 1, 2001, from http://www.ietf.org/html.charters/idn-charter.html
Back

Lemanski-Valente, K., & Majka, T. (2000, October). An overview of international Internet domain registrations. E-Commerce, 17(5), p. 1.
Back

Pastore, M. (2001, July 2). The world's online populations. Retrieved May 1, 2001, from http://cyberatlas.Internet.com/big_picture/geographics/article/0,1323,5911_151151,00.html
Back

Topping, S. (2000, December). The multilingual domain name race: On your mark..get set..WAIT! BizWonk, Inc. Retrieved May 1, 2001, from http://www-106.ibm.com/developerworks/unicode/library/u-domains.html
Back

VeriSign Global Registry Services. (2001a). About VeriSign GRS. Retrieved May 1, 2001, from http://www.verisign-grs.com/aboutus/
Back

VeriSign Global Registry Services. (2001b). General information paper on multilingual domain name resolution. Retrieved May 1, 2001, from http://www.verisign-grs.com/multilingual/Gen_Info_Paper.pdf
Back

Wang, J. (2001, January). The Internet and e-commerce in China: Regulations, judicial views, and government policies. The Computer & Internet Lawyer, 18(1), p. 12.
Back

Wenzel, Z., & Seng, J. (2001, May 23). Requirements of internationalized domain names. Retrieved May 1, 2001, from http://www.i-d-n.net/draft/draft-ietf-idn-requirements-07.txt
Back


Back to menu



Guest Author

Wielansky is a student at the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. Mail: 1717 N. Dayton Street, Apt. 303, Chicago, IL 60614, USA. E-mail: mwielansky@yahoo.com.


Reader comments on this column are welcome. Please send ideas for future discussion, sites for consideration, literacy and student Web pages for sharing, and Glossary suggestions to the Technology Department editor. E-mail: chip@uiuc.edu. Mail: Bertram C. Bruce, Graduate School of Library & Information Science, University of Illinois at Urbana-Champaign, 501 East Daniel Street, MC 493, Champaign, IL 61820, USA.


Back


For an index of JAAL Technology columns available at this site, click here. To print this column, point and click anywhere on the main text; then use your browser's print command.

Citation: Wielansky, M.D. (2001/2002, December/January). Internationalized domain names. Journal of Adolescent & Adult Literacy, 45(4). Available: http://www.readingonline.org/electronic/elec_index.asp?HREF=/electronic/jaal/12-01_Column/index.html



Reading Online, www.readingonline.org
Published December 2001 in the Journal of Adolescent & Adult Literacy
Posted simultaneously in Reading Online
© 2001 International Reading Association, Inc. ISSN 1096-1232