figshare
Browse
crwfppfccdvktvrtdwjbwqcgmczrxfzf.pdf (51.97 kB)

Comparing the Usage of Non-Latin Character Sets in Top-Level Domains

Download (51.97 kB)
preprint
posted on 2021-03-23, 13:48 authored by Hampton MooreHampton Moore
The domain name system, which translates domain names to IP addresses, only allows for alphanumeric characters and hyphens. Meaning that non-Latin based languages can not have domains using their language's character set. Punycode is a technology that encodes any Unicode character into the domains character set and is seeing adoption mainly with the CJK character. The CJK character set makes up 57 percent of Punycode based top-level domains (TLDs), the top seven used Punycode based TLDs, and 96 percent of CJK second-level domains (SLDs) also use a CJK TLD.

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC