Skip to content

Latest commit

 

History

History
123 lines (101 loc) · 3.26 KB

README.md

File metadata and controls

123 lines (101 loc) · 3.26 KB

Any Sketch

Overview

This repo contains the implementation of the AnySketch data structure in Java. AnySketch is a generic data structure that can be configured to act as

Details

AnySketch stores key-value pairs in a sparse format. AnySketch config is defined by:

  1. A List of indexes, each with a Distribution,
  2. A List of value columns, each with a distribution,
  3. Aggregation functions for each value.

When a key is added to a sketch it is mapped to a vector of indexes and vector of values. For each vector of indexes the vectors of values are aggregated with the aggregation functions defined by the config.

Usage

The libraries are available as Maven artifacts in the GitHub Packages Maven registry. Each artifact corresponds to a Java package under org.wfanet.

Note that native library dependencies are not included in the Maven artifacts. These include the following:

  • any-sketch
    • libsketch_encrypter_adapter.so
  • estimation
    • libestimators.so
  • frequency-count
    • libsecret_share_generator_adapter.so

These instead must be built separately and included in the JVM library path. Pre-built libraries may be available as release assets.

Examples

Let's say we want to create a counting Bloom Filter. Here is an example sketch config:

indexes {
  name: "Index"
  distribution {
    uniform {
      num_values: 10000000  # 10M
    }
  }
}
values {
  name: "Frequency"
  aggregator: SUM
  distribution {
    oracle {
      key: "frequency"
    }
  }
}

Create the sketch and insert elements to it:

AnySketch anySketch = SketchProtos.toAnySketch(sketchConfig);
anySketch.insert(123456, ImmutableMap.of("frequency", 1L));
anySketch.insert(999999, ImmutableMap.of("frequency", 1L));

In this example, we did not store any values like demographics in the registers, thus the empty HashMap. AnySketch hashed the keys (123456 and 999999), distributed the hash uniformly to 10M values and put the value 1 in those two registers. In the off chance they collided to the same register, because we are using the UNIQUE aggregator, it marked that register with -1.

Here is an example where we are storing the demographics of each item:

indexes {
  name: "Index"
  distribution {
    uniform {
      num_values: 10000000  # 10M
    }
  }
}
values {
  name: "SamplingIndicator"
  aggregator: UNIQUE
  distribution {
    uniform {
      num_values: 10000000  # 10M
    }
  }
}
values {
  name: "Demographics"
  aggregator: UNIQUE
  distribution {
    oracle {
      key: "demographics"
    }
  }
}

Create the sketch and insert elements to it:

AnySketch anySketch = SketchProtos.toAnySketch(sketchConfig);
anySketch.insert(123456, ImmutableMap.of("demographics", 1L));
anySketch.insert(999999, ImmutableMap.of("demographics", 1L));