IDN - Domain Names with Accents and Umlauts
Since 1 March 2004, it has been possible at SWITCH to register .ch and .li domain names with accents and umlauts (so-called IDN). Internationalized Domain Names IDN is the international term for domain names containing accented characters.
The new characters
These are taken from the character set of the Latin-1 Supplement unicode table, part of which has now been released for registration. The characters involved are the following:
These characters are now available in addition to the 37 characters from the ASCII character set which have been permitted to date:
On 1 December 2005, the following character from the character set of the Latin Extended-A unicode table has been introduced:
We are keeping a constant eye on the situation to see whether there is any need for a further extension of the available characters. At the moment, our top priority is to ensure the correct functioning of the registration system and the DNS (Domain Name System). This is why any additional characters, apart from those set out above, will only be introduced on a gradual basis.
Domain names now up to 63 characters long
Since 1 March 2004, it has been possible for domain names to be up to 63 characters long (the maximum length used to be 24 characters). When calculating the length of a domain name, it is the characters between www. and .ch or .li that count. The actual number of characters, however, is determined by the so-called ACE string for the domain name (see below). In the case of names without umlauts and special characters this is the length of the actual name, but for names with umlauts and special characters, it is longer, due to the algorithm employed. To give a number of examples:
www.switch.ch is 6 characters long,
www.ethz.ch is 4 characters long,
www.bücher.ch, however, is not 6 but 13 characters long, since its corresponding ACE string is xn--bcher-kva.ch.
There were essentially two different options open for introducing internationalized domain names (IDN). The first was to make adjustments to the domain name system (DNS) which would allow unicode characters to be used directly. It was felt that this was too drastic a measure, and hence the second option was chosen. This involved compiling an algorithm to specify how a unicode string should be converted into a permitted ASCII domain name. This ACE string (ACE stands for ASCII Compatible Encoding) is then entered into the DNS. The introduction of IDN means that, for the very first time, the entry in the DNS is no longer identical with the domain name.
Name Preparation, Punycode
A number of requirements have to be fulfilled before a unicode string can be converted into an ACE string. This is done by the so-called "Nameprep" procedure, which makes sure that no inadmissible characters are included. Umlauts, which are made up of two characters, have to be replaced by a single character, e.g. a + ¨ = ä. This process is referred to as "normalization". In addition, all big Latin letters are converted into small letters. This is known as "case mapping" or "case folding".
If non-ASCII characters are contained in the string after the "name preparation" has been run through, the system places the prefix xn-- in front of this string. Punycode takes the non-ASCII characters out of the actual domain name, notes their position, and adds them on at the end of the name again, in coded form, separated by means of a further hyphen. An example
The domain name and the entry in the DNS are two different things with IDN.
bücher.ch is the domain name,
xn--bcher-kva.ch is the ACE string, and it is this string that is entered in the DNS.
For technical reasons, the character string that has been processed by the algorithm is several characters longer than the domain name itself. The domain name "www.buecher.ch" is seven characters long. The corresponding ACE string, however, is 13 characters long.
bücher.ch = domain name = must be at least three characters long,
xn--bcher-kva.ch = DNS entry = may be a maximum of 63 characters long.
- RFC 3492 Encoding Scheme (Punycode)
- RFC 5890 IDNA (Internationalized Domain Names for Applications): Framework
- RFC 5891 IDNA: Protocol
- RFC 5892 IDNA: Unicode Code Points
- RFC 5893 IDNA: Right-to-Left Scripts
- RFC 5894 IDNA: Background, Explanations, and Rationale
Programs with IDN capability
We are including a list of programs with IDN capability (web browsers and e-mail applications) without claiming this to be an exhaustive list. If you have any questions or encounter any problems please contact the manufacturers directly.
- Microsoft Internet Explorer 7 and above
- Microsoft Internet Explorer 5.0 and 6.0 for Windows with i-Nav plug-in from VeriSign
- Firefox from version 0.8
- Opera from version 7.11
- Safari from version 1.2 (Mac OS X 10.3)
- Konqueror (Linux, KDE 3.2 and above, with GNU IDN Library)
- MS Outlook and Outlook Express for Windows with i–NavOutlook plug-in from VeriSign
- Apple Mail from version 3.2 (Mac OS X 10.5)
SWITCH does not guarantee that domain names with umlauts and accents as per Annex 2 of the GTC are suitable for use in conjunction with programs such as browsers and e-mail applications and does not accept any liability in this respect.
Not all the characters are available on the keyboard. In Switzerland alone, there are four standards that apply, each with different keyboard layouts. It can, however, be assumed that a domain name that contains special Swedish characters will have been registered by someone with the corresponding keyboard. The same holds true for Spanish or German characters, for example. The following table provides explanations and input aids for typing in all the new characters.
On a PC, it is also possible for the individual characters to be entered in ASCII code via the numeric pad. To do this, activate "Num Lock", then keep the "Alt" key pressed and type in "133", for example. When you let go of the "Alt" key, the character à will appear. On a notebook, you should first activate "Num Lock" with the "FN" key and then similarly keep the "Alt" key pressed while entering "133" via the numeric pad.
Macintosh users have the "character palette" available to them for entering characters from different character sets.