figshare
Browse
Unishox_Article_2.pdf (159.68 kB)

Unishox: A hybrid encoder for Short Unicode Strings

Download (159.68 kB)
Version 2 2021-11-21, 21:44
Version 1 2021-11-20, 20:56
preprint
posted on 2021-11-21, 21:44 authored by Arundale RamanathanArundale Ramanathan

Unishox is a hybrid encoding technique with which short unicode strings could be compressed using context aware pre-mapped codes and delta coding resulting in surprisingly good ratios.


This article discusses a hybrid encoding method for compressing Short Unicode Strings of arbitrary lengths including Latin/English text and printable special characters. This has not been sufficiently addressed by lossless entropy encoding methods so far.


Although it appears inconsequential, space occupied by such strings be- come significant in memory constrained environments such as Arduino Uno and ESP8266. Text exchange in Chat applications is another area where cost sav- ings could be seen using such compression. It is also possible to achieve savings in bandwidth and storage cost by storing and retrieving independent strings in Cloud databases.

History