Metrics from Compressing the Human Genome with Six Programs
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Metrics from running six compression programs, namely brotli, bzip2, gzip, 7z, xz and zstd, on the human genome from the Human Genome project, in 2bit binary format. For each program, the dataset records
- the time it takes to compress the data, in seconds,
- the resulting compressed size, in bytes,
- the time it takes to decompress the data, in seconds, and
- peak memory usage, in kilobytes.