-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large Quantity Data Set Handling #607
Comments
here is a really rough pass. it errors on some modules, but it processes enough to do some testing. the @Johann-PLW can you try this against your larger data set to see if it helps? |
i am less concerned about the size of the html file itself. this is more about the browser being able to load and work with the large amount of data. we could do some other things to do some file reduction, but doing something like gzip compression might actually add overhead to the browser making it worse. if this branch code doesnt work to load all of your large data set, we may need to explore other approaches of breaking that data up into segments. |
If this branch code doesnt work to load all of your large data set, we may need to explore other approaches of breaking that data up into segments. That's something I've actually been thinking about, perhaps grouping a large amount of data by year, day, month or hour. I'll do some tests with my dataset with the code of your 'dynamicreport-dataarray' branch and let you know. |
@JamesHabben The heart rate query matches 1028115 records. The number of steps query matches 493272 records. Tests were conducted on a MacBook Pro 2019 - 2,4 GHz Intel Core i9 8 cores - 32 GB RAM with macOS 13.5.1 |
oof. not sure why heart rate didnt reduce more, and frustrated at steps increase. i can reduce some of that using less text in the structure, but i dont think that will make much different in the browser loading this data set. what do you think about sampling the data on the python side? 1mil records is a lot of data and will be hard to incorporate processing it in a broader framework like this. i wonder if we can find a framework that can do some time based sampling, averaging, anomoly highlight, and pass a reduced set of data to the browser. |
@Johann-PLW What's the time range and frequency of your heart beat data? If we did some summary of data, say every 15 mins, how many records would that reduce? Might have to adjust based on the frequency. We can provide typical summary numbers like minimum, maximum, average, mean, etc. and if the user wants to investigate in more detail, then TSV output is available. While typing this though, I wanted to do some math. I think hourly summary periods really might need to be the one. Here are my calcs:
|
@JamesHabben
I think we could also remove some columns like Device and Manufacturer As the device and/or software used to collect the data, and the timezone are very repetitive, could we use an array to store once the information and display it in all records? |
i think we have this solved with the upcoming lava release. @Johann-PLW do you agree? |
Problem
overall, there are modules that are or have potential for parsing out a large number of data records, and writing those records into HTML tags creates a large overhead of both file storage and processing, leading to bloated reports and potential non-load issues on less powerful computers.
Data
I noticed that the
health - heart rate
output from Josh's public image creates around 23.5k records and a 10mb html file. it is also timing out on some of my computers due to memory. thehealth - steps
is 15.5k records and a 4mb html file.Solution
i think we can address this with a relatively low impact change by loading data from either a separate json file, or possibly a sqlite db file. json would be a lower impact to the code base since it would be native to javascript. i am exploring with the structure of having an option to have a module write the data to a json file and have the html file load it when rendered. it wont be 'true' json since that typically has the field names in front of every value for every record. instead, an array of data rows, that are themselves just array of data fields will drastically reduce the size of the data set.
true json
array of array
Tag: @Johann-PLW , @abrignoni
The text was updated successfully, but these errors were encountered: