Skip to content

Commit

Permalink
[SPARK-36919][SQL] Make BadRecordException fields transient
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in the exception chaining behavior. In a case of parsing a malformed CSV, where the root cause exception should be Caused by: java.lang.RuntimeException: Malformed CSV record, only the top level exception is kept, and all lower level exceptions and root cause are lost. Thus, when we call ExceptionUtils.getRootCause on the exception, we still get itself.
The reason for the difference is that RuntimeException is wrapped in BadRecordException, which has unserializable fields. When we try to serialize the exception from tasks and deserialize from scheduler, the exception is lost.
This PR makes unserializable fields of BadRecordException transient, so the rest of the exception could be serialized and deserialized properly.

### Why are the changes needed?
Make BadRecordException serializable

### Does this PR introduce _any_ user-facing change?
User could get root cause of BadRecordException

### How was this patch tested?
Unit testing

Closes apache#34167 from tianhanhu/master.

Authored-by: tianhanhu <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit aed977c)
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
tianhanhu authored and HyukjinKwon committed Oct 6, 2021
1 parent 88f4809 commit 9760c8a
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,6 @@ case class PartialResultException(
* @param cause the actual exception about why the record is bad and can't be parsed.
*/
case class BadRecordException(
record: () => UTF8String,
partialResult: () => Option[InternalRow],
@transient record: () => UTF8String,
@transient partialResult: () => Option[InternalRow],
cause: Throwable) extends Exception(cause)
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import scala.collection.JavaConverters._
import scala.util.Properties

import com.univocity.parsers.common.TextParsingException
import org.apache.commons.lang3.exception.ExceptionUtils
import org.apache.commons.lang3.time.FastDateFormat
import org.apache.hadoop.io.SequenceFile.CompressionType
import org.apache.hadoop.io.compress.GzipCodec
Expand Down Expand Up @@ -365,6 +366,7 @@ abstract class CSVSuite
}

assert(exception.getMessage.contains("Malformed CSV record"))
assert(ExceptionUtils.getRootCause(exception).isInstanceOf[RuntimeException])
}
}

Expand Down

0 comments on commit 9760c8a

Please sign in to comment.