-
Notifications
You must be signed in to change notification settings - Fork 44
Home
This project is focus on building a comprehensive benchmark for comparing time and space efficiency of open source compression codecs on JVM platform. Codecs to include need to be accessible from Java (and thereby from any JVM language) via either pure Java interface or JNI; and need to support either basic block mode (byte array in, byte array out), or streaming code (InputStream in, OutputStream out).
Benchmark suite is based on Japex framework.
In addition to benchmark itself, we also provide access to set of benchmark results, which can be used for overview of general performance patterns for standard test suites. It is recommended, however, to run tests yourself since they vary depending on platform. In addition, to get more accurate understanding of how results apply to your use case(s), the best thing to do is to collect specific set of test data that reflects your usage, and run tests over this.
Currently following codecs are included in distribution:
- LZF (block and streaming modes)
- QuickLZ (block mode)
- Gzip: JDK, JCraft (streaming mode)
- Bzip2 from commons-compression (streaming mode)
- Snappy (Java JNI wrapper over native Snappy) (block mode; streaming will be added soon)
- LZMA by 7zip (block mode)
- note: due to API impedance, full buffering is done; so implementation is bit sub-optimal. However, since this is a slow algorithm/codec (relatively speaking), its effects should not be drastic.
Since there are two basic compression modes (block mode, streaming mode), there are either one or two tests per codec.
There is work in progress to possibly include:
- [LZMA-Java], a derivative of 7zip's codec; cleaner (wrt Java), maintained
- LZO-Java, pure Java version of LZO
In addition to codecs included, we are aware of other JVM codecs that we can not yet support (due to API or licensing); as well as codecs for which a JVM-accessible version may be forthcoming. These included
- FastLZ: no Java version
- [LZO by oberhumer](from http://www.oberhumer.com/opensource/lzo/): only Java decompressor, no compressor (test suite needs both, to generate compressed data for decompression)
To access source, just clone project: https://github.com/ning/jvm-compressor-benchmark
To participate in discussions of benchmark suite, results, and other things related to compression performance, please join our discussion group
We have tried to make use of existing de-facto standard test suites, including:
- Calgary corpus: 18 test files from
- Canterbury corpus: 11 test files
- Maximum Compression: 10 test files
- QuickLZ: 5 test files ("NotTheMusic.mp4" was left out as pathological case which skews decompress speed; and is already covered by samples in other data sets)
Here are some example results we have collected, to give an idea of what kind of performance to expect. Tests were run as single-threaded test on 2.5 GHz mini-Mac.
NOTE: although measurement have "TPS" in them, actual unit for second bar is "MB/sec"; this is an annoying Japex issue. "Size %" is correct, and indicates that the other measurement is for relative size of compressed result compared to original file size.
- Calgary corpus:
- Canterbury corpus:
- Maximum compress:
- QuickLZ: