Estimating Frequencies and finding Heavy Hitters

In this project...

Main Papers

List of papers, authors etc.

How to use

The

Dependencies

All code in this library is dependent on some sort of linux distibution. It has been run and tested the newest versions of Arch Linux and Ubuntu.

To be able to run the unit tests for all implementations the Criterion unit testing library (https://github.com/Snaipe/Criterion) must be downloaded and installed.

Furthermore to run some of the bash scripts, the terminal calculator bc must be installed. This is due to the fact that bash cannot handle floating values.

Modules

To run the benchmark scripts the module libmeasure must be installed and compiled. This can be done the following way:

git submodule init
git submodule update --remote
cd modules/libmeasure
make clean && make
cd ../..

Or simply by checking that the modules/libmeasure folder contains the libmeasure source code and the shared object file libmeasure.so.

Note that libmeasure dependent on the PAPI library (http://icl.cs.utk.edu/papi/), which is used to measure low level stuff, such as L1/L2 cache misses/accesses, branch mispredictions, cycles, instructions, running times, TLB misses/accesses and plenty more.

External Contribution

To generate data according to some Zipfian distribution fast, the Alias Method more or less as implemented in the R library is used. Moreover to generate uniform random numbers between [0;1) throughout the project, the unif_rand has likewise been used.

To be able to compare with existing implementations, some of the implementations was made to fit our heavy hitter abstraction, which enabled testing of the implementation on the same terms as our own implementations. The expectations and results of the authors of those implementations can be found in the paper: Finding Frequent Items in Data Streams

To compute the median in the Count-Median Sketch, the two implementations that had been found the fastest among a lot is used.

Finally to generate the absolute errors of the sketches when the δ-percent of the badest estimates are removed, a quicksort is performed to determine the point to cut off.

Unit Tests

To run the unit tests simply type:

make clean && make && make test

This will run all unit tests in this project. To run a specific set of unittests type:

make clean && make && make run_test_{TESTNAME}

where the {TESTNAME} is the name of a specific test set.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
datasets		datasets
modules		modules
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.ycm_extra_conf.py		.ycm_extra_conf.py
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
benchmark.sh		benchmark.sh
delta.sh		delta.sh
error_sketch.sh		error_sketch.sh
precision.sh		precision.sh
space.sh		space.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Estimating Frequencies and finding Heavy Hitters

Main Papers

How to use

Dependencies

Modules

External Contribution

Unit Tests

About

Releases

Packages

Contributors 2

Languages

License

mortzdk/heavy-hitters

Folders and files

Latest commit

History

Repository files navigation

Estimating Frequencies and finding Heavy Hitters

Main Papers

How to use

Dependencies

Modules

External Contribution

Unit Tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages