Skip to content

Commit

Permalink
Update the java code to properly deal with lists being returned as st…
Browse files Browse the repository at this point in the history
…rings (#16536)

Recently some JSON parsing was updated so lists could be returned as strings. This updates the java code so that when cleaning up the results to match the desired schema that it can handle corner cases associated with lists and structs properly.

Tests are covered in the Spark plugin, but I am happy to add some here if we really want to validate that part of this.

Authors:
  - Robert (Bobby) Evans (https://github.com/revans2)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)

URL: #16536
  • Loading branch information
revans2 authored Aug 12, 2024
1 parent cce00c0 commit e5f8dd3
Showing 1 changed file with 25 additions and 4 deletions.
29 changes: 25 additions & 4 deletions java/src/main/java/ai/rapids/cudf/Table.java
Original file line number Diff line number Diff line change
Expand Up @@ -1084,7 +1084,12 @@ private static DidViewChange gatherJSONColumns(Schema schema, TableWithMeta.Nest
// The types don't match so just return the input unchanged...
return DidViewChange.no();
} else {
String[] foundNames = children.getNames();
String[] foundNames;
if (children == null) {
foundNames = new String[0];
} else {
foundNames = children.getNames();
}
HashMap<String, Integer> indices = new HashMap<>();
for (int i = 0; i < foundNames.length; i++) {
indices.put(foundNames[i], i);
Expand All @@ -1101,8 +1106,9 @@ private static DidViewChange gatherJSONColumns(Schema schema, TableWithMeta.Nest
for (int i = 0; i < columns.length; i++) {
String neededColumnName = neededNames[i];
Integer index = indices.get(neededColumnName);
Schema childSchema = schema.getChild(i);
if (index != null) {
if (schema.getChild(i).isStructOrHasStructDescendant()) {
if (childSchema.isStructOrHasStructDescendant()) {
ColumnView child = cv.getChildColumnView(index);
boolean shouldCloseChild = true;
try {
Expand Down Expand Up @@ -1131,8 +1137,23 @@ private static DidViewChange gatherJSONColumns(Schema schema, TableWithMeta.Nest
}
} else {
somethingChanged = true;
try (Scalar s = Scalar.fromNull(types[i])) {
columns[i] = ColumnVector.fromScalar(s, (int) cv.getRowCount());
if (types[i] == DType.LIST) {
try (Scalar s = Scalar.listFromNull(childSchema.getChild(0).asHostDataType())) {
columns[i] = ColumnVector.fromScalar(s, (int) cv.getRowCount());
}
} else if (types[i] == DType.STRUCT) {
int numStructChildren = childSchema.getNumChildren();
HostColumnVector.DataType[] structChildren = new HostColumnVector.DataType[numStructChildren];
for (int structChildIndex = 0; structChildIndex < numStructChildren; structChildIndex++) {
structChildren[structChildIndex] = childSchema.getChild(structChildIndex).asHostDataType();
}
try (Scalar s = Scalar.structFromNull(structChildren)) {
columns[i] = ColumnVector.fromScalar(s, (int) cv.getRowCount());
}
} else {
try (Scalar s = Scalar.fromNull(types[i])) {
columns[i] = ColumnVector.fromScalar(s, (int) cv.getRowCount());
}
}
}
}
Expand Down

0 comments on commit e5f8dd3

Please sign in to comment.