Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different benchmark result with large file #22

Open
yongkun opened this issue Jun 12, 2014 · 0 comments
Open

Different benchmark result with large file #22

yongkun opened this issue Jun 12, 2014 · 0 comments

Comments

@yongkun
Copy link

yongkun commented Jun 12, 2014

The benchmark result in README shows that size of compressed data is close to each other, but lz4 is much faster, takes about half of the time compared to snappy.

I feel that the file size is a little small (24KB) in this test.

So I tried large file, about 1.5GB, our real data file.

I am using python 2.7.6 on macbook air (1.7G i5).

Python 2.7.6 |Anaconda 1.6.1 (x86_64)| (default, Jan 10 2014, 11:23:15) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin

My test results:

$ python bench.py 
Data Size:
  Input: 1597401203
  LZ4: 184286011 (0.12)
  Snappy: 238020116 (0.15)
  LZ4 / Snappy: 0.774246
Benchmark: 2 calls
  LZ4 Compression: 33.591156s
  Snappy Compression: 7.052323s
  LZ4 Decompression: 51.998798s
  Snappy Decompression : 29.753453s

My result shows that LZ4 has much better compression ratio, LZ4 data size after compression is 184MB, while snappy data size is 238MB, so lz4 is about 23% better in case of compression ratio.
But it takes about 4X+ and 1.7X+ time for compression and decompression respectively.
So it seems that LZ4 is quite slower than snappy with large data set.

Is it correct or maybe I need some tuning on block size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant