-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle large data sets #129
Comments
This discussion might also be useful: https://www.r-bloggers.com/persistent-config-and-data-for-r-packages/ |
This will be relevant for |
Record of GSoC weekly video call on 2020-05-18:
|
I will try to move |
I'm not delved deep enough to understand the essence of the problem, but I will state my point of view and will ask for explanations. So while I was translating some of the vignettes, it was not clear for me, why are the datasets like ?usethis::use_data_raw So, could you summarize why this process to build the example datasets again and again is needed? Is it for unit testing? |
@GegznaV you are basically correct, the storage and (re)generation of the data is complicated and a bit opaque. This summer we have a student @eoduniyi working on streamlining the whole package, thanks to Google Summer of Code. Data issues are getting a close look but it will take a while to address the wide range of issues. |
The reason for "no
|
Found this package and interesting discussion of options while looking for something else. Should look over this before going down any path. |
Another post that might suggest some options https://blog.r-hub.io/2020/05/29/distribute-data/ |
A recent change on |
This topic is touched on in several threads, I thought I'd start a dedicated thread. A while back I had a large data set to deal with, and ever since I've been watching for example R-pkg-devel for this topic. So going through my saved e-mails I found this discussion which seems quite relevant. In particular, the last two messages mentioning
R.cache
anddrat
.drat
has several vignettes. These look like promising ways to package the data and access it as/when needed. The issue of where to put it remains of course.The text was updated successfully, but these errors were encountered: