IDN - Internationalized Domain Names

Domain names with accents and umlauts

The introduction of .ch and .li domain names with accents and umlauts (also known as IDN) took place in March 2004. The abbreviation IDN stands for Internationalized Domain Names. These are domain names that in addition to diacritics such as accents and umlauts may also contain letters from non-Roman alphabets.

The additional characters

These are taken from the character set of the Latin-1 Supplement unicode table, part of which has been released for registration. The characters involved are the following:

in addition to the 37 characters from the ASCII character set permitted until then:

In December 2005, the following character from the character set of the Latin Extended-A unicode table has been introduced:

We are keeping a constant eye on the situation to see whether there is any need for a further extension of the available characters. At the moment, our top priority is to ensure the correct functioning of the registration system and the DNS (Domain Name System). This is why any additional characters, apart from those set out above, will only be introduced on a gradual basis.

Length can be up to 63 characters

Since March 2004, a domain name can be a maximum of 63 characters long (before that it was 24 characters). When calculating the length of a domain name, it is the characters between www. and .ch or .li that count. The actual number of characters, however, is determined by the so-called ACE string for the domain name (see below). In the case of names without umlauts and special characters this is the length of the actual name, but for names with umlauts and special characters, it is longer, due to the algorithm employed. To give a number of examples:

www.switch.ch is 6 characters long,
www.ethz.ch is 4 characters long,
www.bücher.ch, however, is not 6 but 13 characters long, since its corresponding ACE string is xn--bcher-kva.ch.

ACE string

There were essentially two different options open for introducing internationalized domain names (IDN). The first was to make adjustments to the domain name system (DNS) which would allow unicode characters to be used directly. It was felt that this was too drastic a measure, and hence the second option was chosen. This involved compiling an algorithm to specify how a unicode string should be converted into a permitted ASCII domain name. This ACE string (ACE stands for ASCII Compatible Encoding) is then entered into the DNS. The introduction of IDN means that, for the very first time, the entry in the DNS is no longer identical with the domain name.

Name Preparation, Punycode

A number of requirements have to be fulfilled before a unicode string can be converted into an ACE string. This is done by the so-called "Nameprep" procedure, which makes sure that no inadmissible characters are included. Umlauts, which are made up of two characters, have to be replaced by a single character, e.g. a + ¨ = ä. This process is referred to as "normalization". In addition, all big Latin letters are converted into small letters. This is known as "case mapping" or "case folding".

If non-ASCII characters are contained in the string after the "name preparation" has been run through, the system places the prefix xn-- in front of this string. Punycode takes the non-ASCII characters out of the actual domain name, notes their position, and adds them on at the end of the name again, in coded form, separated by means of a further hyphen.

An example

Consequences

The domain name and the entry in the DNS are two different things with IDN.

bücher.ch is the domain name,
xn--bcher-kva.ch
is the ACE string, and it is this string that is entered in the DNS.

For technical reasons, the character string that has been processed by the algorithm is several characters longer than the domain name itself. The domain name "www.buecher.ch" is seven characters long. The corresponding ACE string, however, is 13 characters long.

bücher.ch = domain name = must be at least three characters long,
xn--bcher-kva.ch = DNS entry = may be a maximum of 63 characters long.

IETF Standards
  • RFC 3492 Encoding Scheme (Punycode)
  • RFC 5890 IDNA (Internationalized Domain Names for Applications): Framework
  • RFC 5891 IDNA: Protocol
  • RFC 5892 IDNA: Unicode Code Points
  • RFC 5893 IDNA: Right-to-Left Scripts
  • RFC 5894 IDNA: Background, Explanations, and Rationale

Support for IDN domain names is now common in current browsers and email programs. However, it is still recommended not to rely solely on an IDN for important applications.

SWITCH does not guarantee that domain names with umlauts and accents as per Annex 2 of the GTC are suitable for use in conjunction with programs such as browsers and e-mail programs and does not accept any liability in this respect.

Not all the characters are available on the keyboard. In Switzerland alone, there are four standards that apply, each with different keyboard layouts. It can, however, be assumed that a domain name that contains special Swedish characters will have been registered by someone with the corresponding keyboard. The same holds true for Spanish or German characters, for example. The following table provides explanations and input aids for typing in all the new characters.

On a PC, it is possible for the individual characters to be entered in ASCII Code via the numeric pad. To do this, activate "Num Lock", then keep the "Alt" key pressed and type in "133", for example. When you let go of the "Alt" key, the character à will appear. On a notebook, you should first activate "Num Lock" with the "FN" key and then similarly keep the "Alt" key pressed while entering "133" via the numeric pad.

Macintosh users have the "Character Viewer" available to them for entering characters from different character sets.