Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64 encode compaction input file list #5229

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,13 @@
import java.util.Objects;
import java.util.Set;

import org.apache.accumulo.core.data.Range;
import org.apache.accumulo.core.fate.FateId;
import org.apache.accumulo.core.metadata.ReferencedTabletFile;
import org.apache.accumulo.core.metadata.StoredTabletFile;
import org.apache.accumulo.core.spi.compaction.CompactionKind;
import org.apache.accumulo.core.spi.compaction.CompactorGroupId;
import org.apache.hadoop.fs.Path;

public class CompactionMetadata {

Expand Down Expand Up @@ -93,10 +95,31 @@ public FateId getFateId() {
return fateId;
}

private static class InputFile {
String path;
String startRow;
String endRow;

private static InputFile from(StoredTabletFile stf) {
InputFile i = new InputFile();
i.path = stf.getPath().toString();
Range r = stf.getRange();
i.startRow = r.getStartKey() == null ? "null" : r.getStartKey().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May not want to pass data through toString as that could corrupt binary data. It would be nice if we could use the internal class and machinery in StoredTabletFile here, but as @cshannon mentioned that would break encapsulation. Encapsulation is nice and I would not want to see StoredTabletFile internals in my ide when doing completion, we could break encapsulation in a limited way possibly by doing the following.

  • Move StoredTabletFile and CompactionMetadata into the same package. They are almost in the same package, one is in o.a.a.c.metadata and the other is in o.a.a.c.metadata.schema.
  • Change StoredTabletFile to make TabletFileCqMetadataGson package private and add some package private static methods to go to from TabletFileCqMetadataGson<->StoredTabletFile
  • In CompactionMetadata.GSonData use TabletFileCqMetadataGson

This way most of the Accumulo code can not see StoredTabletFile.TabletFileCqMetadataGson and we still get a nice human readable encoding maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path and range are defined in AbstractTabletFile. Can we just move the serialization and deserialization methods, and TabletFileCqMetadataGson there and make them public?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path and range are defined in AbstractTabletFile. Can we just move the serialization and deserialization methods, and TabletFileCqMetadataGson there and make them public?

IMO it would be nice to avoid making them public. I was thinking some package private methods like the follow could be added to StoredTabletFile in addition to making TabletFileCqMetadataGson package private. Could refactor the code in StoredTabletFile to accomodate and or use these methods.

  static StoredTabletFile deserialize(TabletFileCqMetadataGson serialized) {
    
  }

  static TabletFileCqMetadataGson serialize(StoredTabletFile storedTabletFile) {

  }

Copy link
Contributor

@cshannon cshannon Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When i first looked into this issue and I mentioned that we would break encapsulation I should have elaborated that I was referring to TabletFileCqMetadataGson being private and that the StoredTabletFile kept all of that internal so I didn't want to expose it.

I agree with @keith-turner that if we at least make the code around the serialization package protected and not public that would be a good compromise I think. The TabletFileCqMetadataGson would certainly be the thing to re-use here because it already handles everything correctly with encoding/decoding the binary ranges so no need to have to do it again.

One other nice benefit of making TabletFileCqMetadataGson and the serialization code for it package protected instead of private is that it would be easier to write some unit tests if we wanted.

i.endRow = r.getEndKey() == null ? "null" : r.getEndKey().toString();
return i;
}

private StoredTabletFile to() {
Range r = new Range(startRow.equals("null") ? null : startRow,
endRow.equals("null") ? null : endRow);
return StoredTabletFile.of(new Path(path), r);
}
}

// This class is used to serialize and deserialize this class using GSon. Any changes to this
// class must consider persisted data.
private static class GSonData {
List<String> inputs;
List<InputFile> inputs;
String tmp;
String compactor;
String kind;
Expand All @@ -107,9 +130,8 @@ private static class GSonData {
}

public String toJson() {
GSonData jData = new GSonData();

jData.inputs = jobFiles.stream().map(StoredTabletFile::getMetadata).collect(toList());
final GSonData jData = new GSonData();
jData.inputs = jobFiles.stream().map(InputFile::from).collect(toList());
jData.tmp = compactTmpName.insert().getMetadata();
jData.compactor = compactorId;
jData.kind = kind.name();
Expand All @@ -122,8 +144,7 @@ public String toJson() {

public static CompactionMetadata fromJson(String json) {
GSonData jData = GSON.get().fromJson(json, GSonData.class);

return new CompactionMetadata(jData.inputs.stream().map(StoredTabletFile::new).collect(toSet()),
return new CompactionMetadata(jData.inputs.stream().map(InputFile::to).collect(toSet()),
StoredTabletFile.of(jData.tmp).getTabletFile(), jData.compactor,
CompactionKind.valueOf(jData.kind), jData.priority, CompactorGroupId.of(jData.groupId),
jData.propDels, jData.fateId == null ? null : FateId.from(jData.fateId));
Expand Down
Loading