-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not all attributes are being exported #210
Comments
Thank you for the suggestion! While it will take me a while to get to this, we always encourage PRs especially since you already know what is going on in the xml! Let me know if you need any help. |
Hello
Thanks so much! It is a great tool due to NCBI's limitations on retrieving
SRA metadata.
One thing I was wondering was whether there was a way to select certain
fields? In your docs you demonstrate how to filter using grep, but is there
a way to select a specific column of the metadata? What if I just wanted
the "run_accession" and the "total_size" ?
Thanks
RF
…On Wed, Feb 14, 2024 at 7:03 AM Saket Choudhary ***@***.***> wrote:
Thank you for the suggestion! While it will take me a while to get to
this, we always encourage PRs especially since you already know what is
going on in the xml! Let me know if you need any help.
—
Reply to this email directly, view it on GitHub
<#210 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AICZAJ6KTI5F3CJW4JHRFYLYTTG5ZAVCNFSM6AAAAABDFOIILGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBUGAYTMNJZGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
if you don't want to use grep, maybe you can create a python script to select the fields you like, for example:
|
hi @returnOfTheYeti, I would go with @arcones' recommendation here. |
Hello
Thank you for your response. My initial problem was that there are multiple
fields missing in the output, compared to what is actually listed on SRA.
For example, here you see one important field "gisaid_accession", listed in
the link below:
https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=2&WebEnv=MCID_65df8dedd6fce424fe3cff83&o=acc_s%3Aa
But for the headers in pysradb using the command: srrs =
list(raw_pysradb_data_frame), you get:
['study_accession', 'run_accession', 'study_title', 'experiment_accession',
'experiment_title', 'experiment_desc', 'organism_taxid', 'organism_name',
'library_name', 'library_strategy', 'library_source', 'library_selection',
'library_layout', 'sample_accession', 'sample_title', 'biosample',
'bioproject', 'instrument', 'instrument_model', 'instrument_model_desc',
'total_spots', 'total_size', 'run_total_spots', 'run_total_bases']
This is one field that is missing, but there are multiple fields that are
missing from other samples.
My question remains: How do I go about retrieving the "gisaid_accession"
from a sample? Is it not possible?
Thanks again
…On Wed, Feb 21, 2024 at 7:35 PM Saket Choudhary ***@***.***> wrote:
hi @returnOfTheYeti <https://github.com/returnOfTheYeti>, I would go with
@arcones <https://github.com/arcones>' recommendation here.
—
Reply to this email directly, view it on GitHub
<#210 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AICZAJ3OF7C6K5F425E6P3TYU24HFAVCNFSM6AAAAABDFOIILGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJYGU4TKMRQGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
These are not standard fields that are defined for each project and hence currently not supported. |
In the SRA db, in the run info, as well as in the XML, one can see variables such as "GISAID_Accession" and "SARS-CoV-2_diagnostic_pcr_Ct_value_1" for certain samples (below).
https://www.ncbi.nlm.nih.gov/sra/?term=SRR15168846
But when I extract the detailed data for this sample using:
pysradb metadata SRR15168847 --detailed | head
these attributes mentioned above, are missing from the pysradb output. Is there any way to retrieve ALL of the metadata? Or at least, specific attributes that are not included in the "detailed" setting?
I downloaded pysradb on Feb 12, 2024 via conda
The text was updated successfully, but these errors were encountered: