You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want the ability to download clinical data for all donors in a File Repository query (including a query with no filters). To accomplish this, we want to add an endpoint to the gateway where a SQON filter for the file repository can be provided and the gateway will return the TSV download for all donors included in that filter.
Possible Implementation
This endpoint can be a GET request that takes the SQON as a query parameter. If no SQON is provided, we will use a default case of "all files" (no filter).
The handler should use this query to get the list of unique Donor IDs from the ES file centric index. It is important that this query apply the serverSide filters we have on all arranger requests that will filter the results based on the user permissions and the file embargo stage meta-data. With the list of donors retrieved, the donor data can be retrieved from the clinical service.
Considerations for large queries
Since the number of files will likely be in the tens or hundreds of thousands, we should instead be retrieving the donor ID aggregation unique values. This will work well up until we press against the ES max buckets limit (around 65k). A composite aggregation should allow streaming all unique donor IDs for the filter. A limit to the max donors in the request may be needed as the total number of ARGO donors increases.
The text was updated successfully, but these errors were encountered:
Detailed Description
We want the ability to download clinical data for all donors in a File Repository query (including a query with no filters). To accomplish this, we want to add an endpoint to the gateway where a SQON filter for the file repository can be provided and the gateway will return the TSV download for all donors included in that filter.
Possible Implementation
This endpoint can be a GET request that takes the SQON as a query parameter. If no SQON is provided, we will use a default case of "all files" (no filter).
The handler should use this query to get the list of unique Donor IDs from the ES file centric index. It is important that this query apply the serverSide filters we have on all arranger requests that will filter the results based on the user permissions and the file embargo stage meta-data. With the list of donors retrieved, the donor data can be retrieved from the clinical service.
Considerations for large queries
Since the number of files will likely be in the tens or hundreds of thousands, we should instead be retrieving the donor ID aggregation unique values. This will work well up until we press against the ES max buckets limit (around 65k). A composite aggregation should allow streaming all unique donor IDs for the filter. A limit to the max donors in the request may be needed as the total number of ARGO donors increases.
The text was updated successfully, but these errors were encountered: