-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory/performance #65
Comments
Hi McClain, Yes that does seem strange. It has been quite a while since I've been involved with X12 and I won't have time to investigate, but one thing that comes to mind is perhaps you have more than one SC832 in your file. If that is the case, I think there is a way to avoid building the entire tree at once. I think there is some sample code (examples/split.rb iirc) that might help with that. Otherwise if all 9MB are for a single transaction set, we'd probably need to look at a profiler to see what's going on. It's been long enough that I've forgotten what input sizes were "normal" in my usage of the library. -- Kyle
|
It is indeed a document with about 20 832s containing about 5000 line items for a large manufacturer (all skus for an entire line), but is actual data I have to deal with some day. Turning off GC brings the runtime down to a minute or so, but the heap usage clearly pays the price, so I'd hazard to guess that it's heap thrash generally. Having dealt with other EDI parsing libs for a minute, it seems that parsing large docs into huge object graphs seems to be a common technique, but doesn't scale particularly well, even with objectively tiny inputs (9mb?!) Thanks for the pointer to split.rb, i'll take a look! |
Hi, I'm seeing something I think is at least similar to what McClain is seeing. Parsing a nearly 5 MB 837 file eventually consumes over 800 MB of RAM. What's more concerning is what I see when I parse 90 or so 837s, varying in size from a few KB up to 4 or 5 MB, in a go. The ram consumed grows fairly steadily to nearly 2 GB by the time all is said and done. I took a look at notes/split.rb but don't think the approach is going to be helpful as you must parse a file before you can split it and parsing seems to be where all of the RAM is going. Thanks. John |
Yeah, now that I look at I'm hoping there is an obvious memory leak that I can find with a profiler. One possible cause is creating new versions of sections of the parse tree as they are being updated; perhaps these are not being garbage collected. Otherwise, I think we might need a different API to enable incremental consumption, rather than build the entire parse tree; building that that could take a while. In the meantime, one chintzy hack you could try is to split large files up using regular expressions. You'll need the output to begin and end with the original header ( |
Thanks for looking at this, Kyle. I'm not able to provide any files; all I On Fri, Jun 24, 2016 at 10:06 AM, Kvle Putnam [email protected]
|
Ah, I understand. You could replace identifying information with randomized text, but that might be a lot of work for a large file, and it might even take a bit of work to review which elements should be randomized. I'll leave the issue open; if you or anyone has some test files at a later date, we can work on it! |
Hi, has anyone come up with a solution for this issue? The runtime for our process has ballooned to 30+ hours and we suspect this is the culprit since we are processing a very large dataset. Is the only solution to break it up via regex as suggested by @kputnam ? |
While this issue still bugs me, personally the payoff is pretty low compared to the amount of work involved, so I'm not likely to work on this. However, if someone can provide diagnostic information it might pique my (or others') interest enough to get the ball rolling. I've added a tool in |
Well, i've played a bit with this one. I'm not 100% sure but seems IO is a bottleneck in this case. Consider the following script (nothing special, just generates a 832 sample): base = <<-X12
ISA*00* *00* *14*004321519IBMP *ZZ*7777777777 *071011*0239*U*00400*000409939* *P*>~
GS*SC*004321519*7777777777*20071011*0239*409939*X*004010~
ST*832*0001~
BCT*PC*FISHER 10/10/07****CATALOG 0000001~
DTM*009*20071010*1632~
N1*ST*SPECIAL LABS*91*999999001~
N3*1000 PENNSYLVANIA AVE~
N4*PITTSBURGH*PA*15222~
X12
prod1 = <<-X12
LIN*000001*VN*1234321~
PID*F*08***LENS PAPER BOOK 12BK/PK~
PO4*12**EA~
CTP**CON*14.80*1*PK~
X12
prod2 = <<-X12
LIN*000002*VN*2345432~
PID*F*08***LENS PAPER BOOK 24BK/PK~
PO4*12**EA~
CTP**CON*20.80*1*PK~
X12
footer = <<-X12
CTT*2~
SE*16*0001~
GE*1*409939~
IEA*1*000409939~
X12
buffer = StringIO.new
buffer.write(base)
1000.times do
buffer.write(prod1)
end
1000.times do
buffer.write(prod2)
end
buffer.write(footer)
File.open('832.x12', 'w') do |f|
f.write(buffer.string)
end Here is a sample output from the profiler:
If we take a look at what is actually # (see AbstractInput#at)
# @return [String]
def at(n)
raise ArgumentError, "n must be positive" unless n >= 0
@io.seek(@offset + n)
@io.read(1)
end It seeks the by offset and n, and then reads a byte. Please note I'm using SSD so random IO is much faster that on an HDD, yet seek is still an expensive operation. Those are just basic thoughts, but might be a good starting point to follow. Not sure where the memory consumption comes from tho. |
I've also compared FileIO vs StringIO(had to do some tweaks to the input class) which obviously gives a boost as of file vs memory (which i believe on HDD vs Memory will have a significant difference, unfortunately i do not have an HDD based computer by hand to prove that):
Code profile differs as well
Versus
The output above is from https://github.com/tmm1/stackprof |
@RomanKapitonov Thank you! very good info @kputnam What do you think? |
@RomanKapitonov thanks for that! I knew reading from an IO would be slower but I haven't compared in a long time, and didn't realize it was that big of a difference. Also, thanks for the tip about stackprof... looks like that will come in handy. Sometime in the future I would like to review the tokenizer ( The bigger concern for me is the memory usage, though. I've heard reports that stupidedi (possibly in combination with other in-process things) can take multiple gigabytes of memory to read fairly small files (<< 1GB). I know of a couple places that use excess memory, but without a memory profile I would be poking in the dark trying to fix it. |
@kputnam, @irobayna for the case of 5000 copies of prod1 and prod2 from the script above and i get the following results (40012 segments). The amount of used memory is pretty huge. Used tool: https://github.com/SamSaffron/memory_profiler GC is disabled, here are some observations (sizes are in bytes):
Or when grouped by file location:
The top one consumer seems to be: https://github.com/irobayna/stupidedi/blob/f3ca1a27b6cc5e25038086c18374076a40a4fbdd/lib/ruby/object.rb#L17-L19 Also a huge amount of strings are retained. I believe this is because they are allocated during the parse phase. A complete profile is available through this gist https://gist.github.com/RomanKapitonov/3c0399fc7989867bac184084ca9292e4 The results are obtained by running
With the following section being wrapped into profiler: report = MemoryProfiler.report do
parser, _result = parser.read(Stupidedi::Reader.build(io))
end
report.pretty_print |
@RomanKapitonov Thanks Roman, a mountain of very valuable data to look at. Definitely room for improvement. |
@irobayna @kputnam I've had a little time today to take a closer look at this. I've eventually evaluated the following result. Based on the observation from memory profiler, i've tried switching from "immutability first" concept to mutating the objects in one case. I've switched to the following definition for def cons(array = [])
array.unshift(self)
end
def snoc(array = [])
array.push(self)
end Consider the following output from two test runs. Original
Mutating
The distance between nodes breaks for some reason, but i've checked whether the generated tree is the same and so it seems to be. So disregard the twice segments amount in the second case, it is still 40012 segments. An interesting part in this is that memory usage for T_ARRAY is 4 times reduced. According to the doc
|
Interesting findings, in particular, memory usage. Could be a case where speed might become irrelevant for the sake of keeping memory low. BTW, I thought |
Hey @RomanKapitonov, thanks so much. This is really helpful! I will start poking at this next week, once I've finished some other things I have started (#164). Don't spend too much time working out how to optimize the tokenizer (FileInput), because i have some plans to rework that stuff soon. |
@RomanKapitonov, can you paste the code you used to generate this output formatting?:
|
Looks like we'll be changing the implementation of But in the meantime, here is some benchmark data comparing 4 implementations of The Thanks @RomanKapitonov for pointing me in that direction! |
Also, note the bottom line of each file shows a huge increase in memory consumption on ruby 2.4... I'm not sure if that's a general issue, or something specific to my benchmark? I'm guessing it's changes internal to the ruby VM.
|
@kputnam Do you still need code sample for getting my output? I assume not since you've crafted a nice benchmark covering more cases. But let me know just in case. Mine is basically one of the cases covered by yours. |
Hello, I've ran into a performance issue with stupidedi. I'm not sure if it relates to what has been previously discussed here. For now we I'll likely pursue splitting the file prior to parsing, but I am confused about what is causing the slowness. I've got a file with 841 PO850s (purchase orders). The file has around 90,000 lines, and I find that if I remove the last ~10,000, then stupidedi is able to parse the remaining 80,000 lines very quickly (~1-2 seconds). If I parse just the last ~10,000 lines, parsing is slow, but reasonable (~5-10 seconds). All 90,000 lines taken together however take a very long time (Maybe 1 hour? I haven't measured). I used Here is the schema I've defined. I'm not sure if there is some ambiguity that may be slowing down the parser. d::TransactionSetDef.build('PO', '850',
'Purchase Order',
d::TableDef.header('Table 1 - Header',
s:: ST.use(10, r::Mandatory, d::RepeatCount.bounded(1)),
s::BEG.use(20, r::Mandatory, d::RepeatCount.bounded(1)),
s::REF.use(50, r::Optional, d::RepeatCount.unbounded),
s::PER.use(60, r::Optional, d::RepeatCount.bounded(3)),
s::FOB.use(80, r::Optional, d::RepeatCount.unbounded),
s::ITD.use(130, r::Optional, d::RepeatCount.unbounded),
s::DTM.use(150, r::Optional, d::RepeatCount.bounded(10)),
s::TD5.use(240, r::Optional, d::RepeatCount.bounded(12)),
s::PCT.use(276, r::Optional, d::RepeatCount.unbounded),
d::LoopDef.build('N9', d::RepeatCount.bounded(1000),
s::N9.use(295, r::Optional, d::RepeatCount.bounded(1)),
s::MSG.use(300, r::Optional, d::RepeatCount.bounded(1000))),
d::LoopDef.build('N1', d::RepeatCount.bounded(200),
s::N1.use(310, r::Optional, d::RepeatCount.bounded(1)),
s::N4.use(340, r::Optional, d::RepeatCount.unbounded))),
d::TableDef.detail('Table 2 - Detail',
d::LoopDef.build('PO1', d::RepeatCount.bounded(100_000),
s::PO1.use(10, r::Mandatory, d::RepeatCount.bounded(1)),
d::LoopDef.build('PID', d::RepeatCount.bounded(1000),
s::PID.use(50, r::Optional, d::RepeatCount.bounded(1))),
d::LoopDef.build('PO4', d::RepeatCount.bounded(1),
s::PO4.use(90, r::Optional, d::RepeatCount.unbounded)),
d::LoopDef.build('SAC', d::RepeatCount.bounded(25),
s::SAC.use(130, r::Optional, d::RepeatCount.bounded(1))),
d::LoopDef.build('SDQ', d::RepeatCount.bounded(1),
s::SDQ.use(190, r::Optional, d::RepeatCount.bounded(500))),
d::LoopDef.build('TC2', d::RepeatCount.bounded(1),
s::TC2.use(235, r::Optional, d::RepeatCount.unbounded)),
d::LoopDef.build('SLN', d::RepeatCount.bounded(1000),
s::SLN.use(470, r::Optional, d::RepeatCount.bounded(1)),
s::PID.use(490, r::Optional, d::RepeatCount.bounded(1000)),
s::CTP.use(516, r::Optional, d::RepeatCount.bounded(25))))),
d::TableDef.summary('Table 3 - Summary',
d::LoopDef.build('CTT', d::RepeatCount.bounded(1),
s::CTT.use(10, r::Optional, d::RepeatCount.bounded(1)),
s:: SE.use(30, r::Mandatory, d::RepeatCount.bounded(1))))) Would appreciate any insight, thanks! |
@dobrynin thanks for reporting! I suspect it's a huge garbage collection taking place, and I'm betting most performance issues with stupidedi are caused by excessive memory consumption that causes long garbage collections. You could get a better sense if this is happening by monitoring the memory usage with some kind of process monitor, and watching the consumption increase until a long pause with 100% CPU, then memory drops suddenly. That's what I saw happening on synthetically generated large files. Would you be willing to run your file through I'm (slowly) working on a benchmark suite to keep track of performance over time, and to make sure once this is fixed that it doesn't regress. But having real-world examples like yours will really help make sure any performance measurements are are tied to reality. |
@kputnam thanks for the quick response. I can get you those things but I've got a couple other things on my plate today, I'll try to get to it soon. One quick comment about the file size. I am somewhat skeptical that the file size itself is responsible as we've had successful runs with files that are substantially larger than the one that is causing problems (I think the problematic one is ~1 MB while the biggest we've done is ~7 MB, will have to check). So that leads me to believe that there is something about the content/structure of this particular PO850 message that is causing the slowness. |
Hi there,
I've been doing some benchmarking and am puzzled by the memory footprint i'm seeing. Do you expect MRI to use up 2.5+gb of RAM to parse (not pp) a 9mb SC832 doc, with run times well over 4min?
Seems a little... excessive. Is there a trick I'm missing?
The text was updated successfully, but these errors were encountered: