Comparing the Usage of Non-Latin Character Sets in Top-Level Domains
preprintposted on 23.03.2021, 13:48 by Hampton Moore
The domain name system, which translates domain names to IP addresses, only allows for alphanumeric characters and hyphens. Meaning that non-Latin based languages can not have domains using their language's character set. Punycode is a technology that encodes any Unicode character into the domains character set and is seeing adoption mainly with the CJK character. The CJK character set makes up 57 percent of Punycode based top-level domains (TLDs), the top seven used Punycode based TLDs, and 96 percent of CJK second-level domains (SLDs) also use a CJK TLD.