You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[BUG] Currently, if uploading in H5AD format, the original data is left under uploads/files. We can either handle this case differently, or just remove the original data.
[FEATURE] We need to determine how best to allow using existing unstructured metadata, layers, or observation/variable-level matrices (such as UMAP, etc. )
Data upload (large data) - [ENHANCEMENT]
For relatively large datasets (e.g. 10G H5AD file), the current upload is not suitable, this will take forever, or will be interrupted.
Meanwhile, I added a new apache2 config unlimited_uploads.conf with LimitRequestBody 0, and further raised the PHP limits
but there may be other timeout configurations that may also interrupt PHP execution. We need to think of a longer term solution. See #14 , I think this will work, at least for now. If we keep this solution, we should clean the PHP upload script and add proper logging.
Data upload (general)
[QUESTION] It looks like the original metadata is left under uploads/files. This might raise security issues. Should we remove it?
[DOCUMENTATION] We need to update the documentation, in particular for H5AD (and prioritize this format, at least for scRNA-seq).
[REMARK] File names must match exactly, otherwise upload will fail without any meaningful error message, e.g. if using gene.tab instead of genes.tab. Documentation should either be clear about this, or we allow some fuzziness in file names during upload, or we make sure an appropriate error message is displayed.
[REMARK] For failed uploads, some files may remain under /tmp or files/uploads.
The text was updated successfully, but these errors were encountered:
This commit 1981d80 addresses H5AD data and metadata upload.
As for using existing unstructured metadata, layers, or observation/variable-level matrices, I need more time to figure out how exactly this is handled. Available display types are either taken from the columns (adata.obs) if primary, and/or from obsm if there are stored analyses. However, if obsm such as 'X_pca', 'X_tsne', 'X_umap' are present (but not in columns), they are shown as display parameters e.g. X, Y, but the data is actually not accessible, i.e. we get ERROR: Value of 'x' is not the name of a column in 'data_frame'. So observation matrices need to be in the columns to be usable in the curator view, and remaining unstructured metadata, layers, etc. are unused in primary analyses.
Issues and features related to data upload.
[BUG] Currently, if uploading in H5AD format, the original data is left under uploads/files. We can either handle this case differently, or just remove the original data.For relatively large datasets (e.g. 10G H5AD file), the current upload is not suitable, this will take forever, or will be interrupted.
Meanwhile, I added a new apache2 config unlimited_uploads.conf with
LimitRequestBody 0
, and further raised the PHP limitsbut there may be other timeout configurations that may also interrupt PHP execution. We need to think of a longer term solution. See #14 , I think this will work, at least for now. If we keep this solution, we should clean the PHP upload script and add proper logging.
[QUESTION] It looks like the original metadata is left under uploads/files. This might raise security issues. Should we remove it?The text was updated successfully, but these errors were encountered: