Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 1.08 KB

README.md

File metadata and controls

12 lines (8 loc) · 1.08 KB

ChunkingBatchLoader

A batch loader that gathers rows into chunks, both by grouping into records, and by grouping records into chunks for faster insert

Loader reads a .csv file sorted by some column groups records based on that column having the same values E.g column 35 is a PersonID index, so all records for that person are grouped (due to sorting) and the loader will group them to one record.

Most configuration is hard-coded in the Loader class.

Two strategies are implemented:

  • XccMLServiceCallChunkWriter.java - sends a larger JSON structure with a batch of records to be processed and inserted as one transaction
  • XccMulitItemInsertChunkWriter.java - uses XCC to send an array of separate JSON structures (sent as Strings)

Basic approach is a single file-reader that reads and chunks up lines from a CSV file, feeding them to a queue, then a set of writer threads that consume data from that queue and write it to MarkLogic. So the File reader is a "producer" to the queue, and the various writers are "consumers" from the queue, and the code implements a classic producer/consumer pattern.