Skip to content

Commit

Permalink
Add interface to define a Row based serializer (#492)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #492

# Background:

Currently in order to successfully use UDP, you must write some carefully crafted code that will take all the rows of metadata for one side and package it into a collection of bytes. Afterwards the caller will get a `SecString` object back which is a bit representation of all the bytes they passed in, minus the filtered out rows. The user must then extract the corresponding bits for each column into separate MPC Types.  This is a cumbersome process which is error prone, as you must make sure to carefully match up the two steps and any changes can cause a bug.

# This Diff

This diff defines the interface that the caller will use to pass in all their data for serialization / deserialization after UDP.

Step 1. Put all the data into an unordered map of column name to type. Note that the variant type of the data must match the expected type based on the column (i.e. a uint32 column expects `std::vector<uint32_t>`, a int64 vec column expects `std::vector<std::vector<int64_t>>`. Call `serializeDataAsBytesForUDP` to get a vector of vector bytes ready for UDP consumption.
Step 2. Pass this data into the UDP protocol data processor portion. Get back a `SecString` with all the filtered out rows and same structure
Step 3. Call `deserializeUDPOutputIntoMPCTypes`. This will return the same unordered map of column names to the private MPC values that were deserialized from the SecString. The caller is in charge of unboxing the variants to the expected types.

Reviewed By: haochenuw

Differential Revision: D43366172

fbshipit-source-id: 93ac9751c77883e6ddddbecf950b36f7bf60c97d
  • Loading branch information
Tal Davidi authored and facebook-github-bot committed Feb 23, 2023
1 parent 0835694 commit fba5836
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 3 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
/*
* Copyright (c) Meta Platforms, Inc. and affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/

#pragma once

#include <cstddef>
#include <unordered_map>
#include <vector>
#include "IColumnDefinition.h"
#include "fbpcf/frontend/BitString.h"

namespace fbpcf::mpc_std_lib::unified_data_process::serialization {

template <int schedulerId>
class IRowStructureDefinition {
public:
using SecString = frontend::BitString<true, schedulerId, true>;

using InputColumnDataType = std::variant<
std::vector<bool>,
std::vector<uint32_t>,
std::vector<int32_t>,
std::vector<int64_t>,
std::vector<std::vector<bool>>,
std::vector<std::vector<uint32_t>>,
std::vector<std::vector<int32_t>>,
std::vector<std::vector<int64_t>>>;

virtual ~IRowStructureDefinition() = default;

/* Returns the number of bytes to serialize a single row */
virtual size_t getRowSizeBytes() const = 0;

// Serialize each column's worth of data according to the structure
// definition. Each key must match the name of a column in the definition and
// the value contains the data for that column
virtual std::vector<std::vector<unsigned char>> serializeDataAsBytesForUDP(
const std::unordered_map<std::string, InputColumnDataType>& data,
int numRows) const = 0;

// Following a run of the UDP protocol, deserialize the batched BitString
// containing encrypted columns into private MPC types.
virtual std::unordered_map<
std::string,
typename IColumnDefinition<schedulerId>::DeserializeType>
deserializeUDPOutputIntoMPCTypes(const SecString& secretSharedData) const = 0;
};

} // namespace fbpcf::mpc_std_lib::unified_data_process::serialization
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ static std::vector<std::vector<bool>> deserializeAndRevealPackedBits(
return rst;
}

TEST(SerializationTest, IntegerColumnTest) {
TEST(ColumnSerializationTest, IntegerColumnTest) {
auto factories = fbpcf::engine::communication::getInMemoryAgentFactory(2);

auto schedulerFactory0 =
Expand Down Expand Up @@ -160,7 +160,7 @@ TEST(SerializationTest, IntegerColumnTest) {
testVectorEq(vals, rst);
}

TEST(SerializationTest, ArrayColumnTest) {
TEST(ColumnSerializationTest, ArrayColumnTest) {
auto factories = fbpcf::engine::communication::getInMemoryAgentFactory(2);

auto schedulerFactory0 =
Expand Down Expand Up @@ -235,7 +235,7 @@ TEST(SerializationTest, ArrayColumnTest) {
}
}

TEST(SerializationTest, PackedBitFieldColumnTest) {
TEST(ColumnSerializationTest, PackedBitFieldColumnTest) {
auto factories = fbpcf::engine::communication::getInMemoryAgentFactory(2);

auto schedulerFactory0 =
Expand Down Expand Up @@ -334,4 +334,5 @@ TEST(erializationTest, ColumnTypeTest) {
"col4", std::make_unique<IntegerColumn<0, false, 32>>("test"), 4);
EXPECT_EQ(col6->getColumnType(), ColType::UInt32Vec);
}

} // namespace fbpcf::mpc_std_lib::unified_data_process::serialization

0 comments on commit fba5836

Please sign in to comment.