Skip to content

Commit

Permalink
Pass batch size to JSON reader using environment variable (#16502)
Browse files Browse the repository at this point in the history
The JSON reader set the batch size to `INT_MAX` bytes since the motivation for implementing a batched JSON reader was to parse source files whose total size is larger than `INT_MAX` (#16138, #16162). However, we can use a much smaller batch size to evaluate the correctness of the reader and speed up tests significantly.
This PR focuses on reducing runtime of the batched reader test by setting the batch size to be used by the reader as an environment variable. 
The runtime of `JsonLargeReaderTest.MultiBatch` in `LARGE_STRINGS_TEST` gtest  drops from ~52s to ~3s.

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - David Wendt (https://github.com/davidwendt)
  - Bradley Dice (https://github.com/bdice)

URL: #16502
  • Loading branch information
shrshi authored Aug 12, 2024
1 parent 091cb72 commit cce00c0
Show file tree
Hide file tree
Showing 5 changed files with 204 additions and 173 deletions.
1 change: 0 additions & 1 deletion cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,6 @@ add_library(
src/io/csv/reader_impl.cu
src/io/csv/writer_impl.cu
src/io/functions.cpp
src/io/json/byte_range_info.cu
src/io/json/json_column.cu
src/io/json/json_normalization.cu
src/io/json/json_tree.cu
Expand Down
37 changes: 0 additions & 37 deletions cpp/src/io/json/byte_range_info.cu

This file was deleted.

Loading

0 comments on commit cce00c0

Please sign in to comment.