-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk import support for SaveOperation #952
Conversation
Oh wow! Well, this is definitely something we want to add in, and I've had ideas sitting in the back of my head for a while, but no time to put them down. This is an interesting approach though. I'm a bit lost on the interface though... How does this look? From what I can tell, it might look like.. class SaveUser < User::SaveOperation
end
operations = params.many_nested(:users).map {|data| SaveUser.new(data) }
SaveUser.import(operations) I guess some key things to think about before getting too deep in to this are
The only other thing that comes to mind is naming. The name of your operation is intended to be somewhat descriptive of what it does. In this case we're saying "I save a user (SaveUser), but really I save a lot of them too...". A pedantic, but if I wanted my operation to be called Thanks for working on this! I know many have requested it, and it's a tricky task ❤️ |
Thanks for the quick feedback! Yes your example is pretty much spot on, using my example from above it's now: operations = [] of SaveThing
data.each do |datum|
operations << SaveThing.new(...)
end
AppDatabase.transaction do
SaveThing.import(operations)
end In my use-case I am not using
In this WIP it will just generate an insert statement for the set of operations you give it so if you were to provide it with a very large set of operations it will generate a very large insert statement. I feel like it is a useful characteristic for this to remain transactional as an interface and delegate all pagination/splitting to the developer. Then the consumer can decide what to do on failure; for example rollback all 100k records; skip the page that failed; stop inserting but keep anything previous inserted; without having to build this into the API.
I'm not too familiar with SaveUser.import(params.many_nested(:users)) I think to support this the return type would have to change to a set of operations of an exception rather than the current true/false as you'd likely want the operations after the fact to get the records.
Postgres will treat the whole insert transactionally, so if 1 were to fail the whole set would rollback. There are options which I have not included in this PR (but could be added) for telling postgres what to do As for errors, as it's an all or nothing affair in this PR I am triggering the same
One option is to make a separate SaveUser::BulkOperation.insert_all([SaveUser.new]) I don't believe this changes the implementation too much as it would still be working on sets of operations (or parameters turned into operations).
I think there are 2 options for this:
|
I just pushed to another branch an example of generating a separate bulk class for each operation: https://github.com/luckyframework/avram/compare/main...garymardell:avram:bulk-operation?expand=1 operations = [] of SaveThing
data.each do |datum|
operations << SaveThing.new(...)
end
AppDatabase.transaction do
SaveThing::Bulk.insert_all(operations)
end and then in theory you can also do: class ImportUsers < Avram::BulkOperation(SaveUser, Span)
end
ImportUsers.insert_all(users) if you want specific naming. Alternatively this might be a rare enough use-case that we don't generate for each operation and this is the only way to use it. |
I think what you have here is a great start. I dig the change of being able to create a separate class designated for handling bulk imports. We will eventually need support for Can I get an example of what the error handling would look like? I see this doesn't take a block like the other operations do. This is probably fine since returning the entire array of what you passed in might be quite expensive. Are you thinking maybe the interface looks like this? op = ImportUsers.insert_all(users)
if op.saved?
# good
else
# not good
op.errors
end |
@garymardell If you have a moment, I think there's just 2 bits left here.
Then I think we can just merge it in and get something going so we can improve in it more later. Thanks! |
Sorry for delay, this got sidelined by other work going on. I actually have another branch locally that I think is more promising for providing a |
oh sweet! If that's the case, then I'm down to hold off. Should we close out this issue for now, then you can re-open a new one with your updates? |
Yes going to close this one for now. |
Hi, I am currently using my fork with your bulk-insertion PR for the time being. I did a small hack into Avram::Insert.statement to have a separate statement_for_bulk to add "on conflict do nothing." I hope you guys come up with a more permanent solution soon. data import is quite common. Thanks. |
Looking for feedback on the approach along with whether this is something that would likely be merged into Avram before I finish up writing tests. I could also add it as a separate shard.
I have the need for bulk-inserting in the application I am building and noticed that #662 issue is still open. I also came across the implementation of bulk upsert in #789 however looks like there hasn't been much activity in some time.
This PR piggy-backs on the behaviour of a
SaveOperation
as I wanted to avoid duplicated logic, validations, callbacks etc. It essentially follows the same pattern as a regular insert but for multiple operations at the same time. Initially in my app I had something that looked like this:This PR:
import
method that accepts an array of save operations of a specific classbefore_save
,after_save
andafter_commit
Changes to Avram::Insert
Initially in my branch I had created a separate
MultiInsert
class that accepted an array of parameters however there was a lot of duplication. It also felt like theInsert
class as a representation of an insert statement should be able to support this case. I refactored the class to accept an array of parameters.The regular save operation is now an array of params with a single item.
Changes to SaveOperation
I introduced a new
values
method to the public api which returns the the attributes as a hash. This uses the same underlyingattributes_to_hash
thatinsert_sql
was using.The difference between
values
andinsert_sql
is thatvalues
is uncompacted (preserves nil values). This is important for bulk inserting as the number of values($1, $2, $3...)
needs to remain consistent.In my use-case I am inserting tree-like data that has a field for
parent_id
which for the root node is nil. Without this change the first operation for the root has one less parameter than all subsequent nodes and the insert would fail.