-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve enrichment analysis chapter #144
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Amazing, thank you very much! We'll get back to this ASAP |
View / edit / reply to this conversation on ReviewNB Zethson commented on 2023-01-26T10:31:04Z usualy
-> usually PauBadiaM commented on 2023-01-26T13:32:46Z Solved! |
View / edit / reply to this conversation on ReviewNB Zethson commented on 2023-01-26T10:31:05Z Note that these values are completely arbitrary and should be adjusted to the question at hand.
Is it possible to provide more specific guidance here? PauBadiaM commented on 2023-01-26T13:33:09Z Added some guidance. |
View / edit / reply to this conversation on ReviewNB Zethson commented on 2023-01-26T10:31:06Z Not possible to use scanpy's violinplots here? PauBadiaM commented on 2023-01-26T13:36:47Z Unfortunately it is not I believe. |
View / edit / reply to this conversation on ReviewNB Zethson commented on 2023-01-26T10:31:08Z Maybe this only renders weird in ReviewNB but I can see some "- -" at the top of this section? PauBadiaM commented on 2023-01-26T13:42:50Z Weird, I checked and there is no "-" character before the list starts in the raw file. I believe this might be a ReviewNB issue, since it looks good in local and in the version rendered in GitHub. |
View / edit / reply to this conversation on ReviewNB Zethson commented on 2023-01-26T10:31:09Z The order of this should also be changed. Soroor first, Pau next, Isaac added, Anastasia removed and me to the reviewer section. PauBadiaM commented on 2023-01-26T13:44:17Z Added these changes. Here, like in the Key Takeaways section, ReviewNB might be rendering this wrongly but looks good on the GitHub version. |
Solved! View entire conversation on ReviewNB |
Added some guidance. View entire conversation on ReviewNB |
Unfortunately it is not I believe. View entire conversation on ReviewNB |
Weird, I checked and there is no "-" character before the list starts in the raw file. I believe this might be a ReviewNB issue, since it looks good in local and in the version rendered in GitHub. View entire conversation on ReviewNB |
Added these changes. Here, like in the Key Takeaways section, ReviewNB might be rendering this wrongly but looks good on the GitHub version. View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB soroorh commented on 2023-01-27T17:26:09Z Please keep the scanpy t-test section and "Cluster-level gene set enrichment analysis with decoupler" as per the original work. We are aware that we are testing for one cell type vs all the other. We have been following what is reported in the methods section of published papers by distinguished groups for this step, which also run gsea on t-test results done on cluster/cell type vs everything else. If you really wish, you can add a statement to the original work where you emphasise that an alternative solution could be do run t-test per cell type with scanpy. You can keep the section where you demonstrate PauBadiaM commented on 2023-02-17T09:08:30Z It is true that many groups in the past have performed differential tests at the single-cell level, however, it was recently benchmarked how poorly they actually perform (see https://doi.org/10.1038/s41467-021-25960-2). Here it was shown that tests applied to pseudo-bulk profiles were providing much more reliable results, indicating that this is the actual state of the art of best practice for single-cell data analysis. In addition, the current design of the differential analysis is suboptimal as the conditions (unstimulated and stimulated) and other cell types are mixed all together in the reference contrast (stimulated monocytes vs all). These choices will negatively affect any downstream gene set analysis as they rely heavily on the information carried in the contrast statistics that are used as inputs. See another comment below for more context. soroorh commented on 2023-04-10T08:26:53Z there is still an open debate on whether DEG should be assessed by pseudo-bulk or single cell. The aim of the chapter is to present both views. I think in the header or text we explicitly mention what test is one cluster vs all other - this very much depends on the biological question and we are just using it for demonstration here. |
View / edit / reply to this conversation on ReviewNB soroorh commented on 2023-01-27T17:26:10Z I generally don't think we should encourage readers to use violin plots for this purpose: the score is not necessarily interpretable, particularly if biological effect is not discernible, and there is no notion of significance. i suggest they appear once throughout the notebook. Alternatively, you may consider dot plots https://davemcg.github.io/post/lets-plot-scrna-dotplots/
Color of the dot would be the direction, and size would be proportional to significance. PauBadiaM commented on 2023-02-17T09:08:44Z The score itself is interpretable as it represents the gradient of enrichment across the populations of cells. Violinplots provide a visual indicator beyond simply plotting colors on the UMAP, as it was before in the chapter. Dot plot is a nice idea but then a differential test is needed and it will be done at the single-cell level, which its significance is going to be inflated by false positives due to the extremely high number of replicates as seen in the last reference I shared. soroorh commented on 2023-04-10T08:36:28Z I think we need to keep both pseudo-bulk and cell views as in the original work, and warn users by referencing works such as the one you suggested, which is already covered in DEG chapter i believe. I still insists on the scores not being interpretable in context of enrichment - you are showing dist of scores between stim and ctrl to show a difference i.e. enrichment between groups, for this DE tests and dot plot or even barcode plot are more appropriate. Both UMAP and violin are valid representations - you can't fit 100 violin plots for 100 genesets in MSigDB, nor UMAPs. My request is to keep this to one example only, and add more informative plots directed at the problem being tested. |
View / edit / reply to this conversation on ReviewNB soroorh commented on 2023-01-27T17:26:11Z I think here you are suggesting doing DE with t-test on pseudo-bulks. I think i need to clarify a few points:
Your demonstration of using decoupler to use the t-test for differential expression in pseudo-bulks is neither relevant here (this can also be done with scanpy), nor an acceptable workflow for GSEA in general when cells are pseudo-bulked. I have specified below which chunks should be removed. PauBadiaM commented on 2023-02-17T09:08:55Z I agree that gene differential testing should be performed using GLM, as shown in the “Differential gene expression analysis” chapter. However, for this chapter, the use of single-cell instead of pseudobulks, and the use of a noisy reference (instead of doing treated vs untreated per cell type) are bigger concerns than the actual test applied. As a compromise, I used the t-test on the log-transformed normalized pseudobulk profiles, which takes care of library size differences.
Regarding the potential low number of data points for each condition, this is already taken into account by the significance of the test statistic. In this simple example, it actually appears that this number of replicas is far enough to yield statistically significant outputs, as shown in the volcano plot. If there were not enough samples, there would be no significant hits.
Regarding the use of t-test results for GSEA, enrichment methods have been classically run on any contrast statistics, including t-test, as shown in the seminal work of piano (https://doi.org/10.1093/nar/gkt111) and others, I fail to see why it would not be valid.
Regarding the limma-fry workflow, it is not limited to pseudo-bulk, it could also use data at the single-cell level but is not recommended based on the reference I shared in a previous comment. Also, as you may know, limma-fry actually performs differential analysis under the hood to generate moderated t-test statistics which then are used to compute an enrichment score, being this is the reason why it accepts design and contrast matrices. Therefore, the same workflow can be achieved with any other enrichment method if the DEG is done by GLMs. While it is nice that fry works with design and contrast matrices, it is rigid in the sense that the user cannot choose their own differential GLM strategy before enrichment.
Regarding the test implemented in decoupler, it actually uses scanpy under the hood. Its advantage is that it performs the test between conditions and per cell type with a simple one liner, which would take several lines using scanpy. I could change it if needed.
Since differential analysis and enrichment analysis are closely linked, we can do three things:
For continuity, I think option 2 should be the best one because this way users can run the notebook with their data. In the case we do option 2, we would make the comment here that they accommodate complex experimental designs.
soroorh commented on 2023-04-10T08:47:29Z as i mentioned, in the original version of this work we cover both pseudobulk and single cell views, each with their own appropriate contrast explaining what is being tested. In your version, this was completely distorted.
Regarding the potential low number of data points for each condition, this is already taken into account by the significance of the test statistic. In this simple example, it actually appears that this number of replicas is far enough to yield statistically significant outputs, as shown in the volcano plot. If there were not enough samples, there would be no significant hits.
I don't think this is a valid argument based on my academic training.
I think i have been trying to say 1- need both pseudo-bulk and single cell views, each with their own appropriate contrast and statement of limitations. 2- pb + limma-fry for complicated experimental designs. |
View / edit / reply to this conversation on ReviewNB soroorh commented on 2023-01-27T17:26:12Z Please consider showing %variance explained. This was not part of original plot, but it helps. PauBadiaM commented on 2023-02-17T09:09:03Z Agreed, this is a nice addition, added some text about it too. |
View / edit / reply to this conversation on ReviewNB soroorh commented on 2023-01-27T17:26:13Z We are not interested in DE analysis here - please remove PauBadiaM commented on 2023-02-17T09:09:10Z See previous reply. soroorh commented on 2023-04-10T08:48:44Z pls see prev. reply. |
View / edit / reply to this conversation on ReviewNB soroorh commented on 2023-01-27T17:26:14Z We are not interested in DE analysis here - please remove PauBadiaM commented on 2023-02-17T09:09:17Z See previous reply. It is still good to show a volcano plot as a quality control check before running enrichment. soroorh commented on 2023-04-10T08:51:55Z so, if i get a non-symmetric volcano, i should not proceed with the DE? this is an important QC based on a personal practice. The best practice is to check dominant axes of variation by PCA of pbs, and run the appropriate test/contrast. |
View / edit / reply to this conversation on ReviewNB soroorh commented on 2023-01-27T17:26:15Z We are not interested in DE analysis here - please remove PauBadiaM commented on 2023-02-17T09:09:24Z See previous reply.
soroorh commented on 2023-04-10T08:52:09Z pls see pre reply |
View / edit / reply to this conversation on ReviewNB soroorh commented on 2023-01-27T17:26:16Z GSEA in general should not be applied to t-test results from pseudo-bulks. The results here are concordant simply because the dataset is a well controlled dataset with biological effect being large enough to be captured with any method and procedure. I refrain from encouraging people to do GSEA on t-test results from pseudo-bulks. In the original version of this work, we run fgsea on t-test results from single cells, not pseudo-bulks. Please remove this section PauBadiaM commented on 2023-02-17T09:09:42Z See previous reply.
soroorh commented on 2023-04-10T08:52:24Z as above |
View / edit / reply to this conversation on ReviewNB soroorh commented on 2023-01-27T17:26:17Z Please remove the section. In addition to my concerns above, this is already a repetition of what was done with single cells. No advantage w.r.t single cell-level analysis is illustrated. PauBadiaM commented on 2023-02-17T09:09:52Z Regarding the concerns see previous reply. Here we showcase that if the user is interested in the actual contrast between conditions, footprints can also be used at the contrast level. I could remove it though.
soroorh commented on 2023-04-10T08:52:45Z please remove |
Agreed, I rewrote this part. I would still keep {cite}szalai2020pwmethods, since there they explain the assumption of correlation between gene expression and protein activity, but that still gene sets work because most of the time they belong to the same regulatory process. Then, the main concept of footprint is that it is a gene set that is closer to transcriptomics. Additionally, they are usually weighted but gene sets could also be weighted. Let me know what you think. View entire conversation on ReviewNB |
Changed that bit in the first paragraph. Regarding the second, I added some of your remarks but I would keep the latent variable concept. Latent variables are just variables that are inferred from observable variables using any model. Therefore, PCs, canonical factors and gene set scores are indeed latent variables. This concept has even been applied by many groups, including yours in your recent work ExpiMap. View entire conversation on ReviewNB |
Regarding your distinction between enrichment and activity scoring, I do not believe that this is the case, as I pointed in other comments above. Enrichment scoring refers to the overrepresentation of genes, as stated in the original GSEA manuscript: “We calculate an enrichment score (ES) that reflects the degree to which a set S is overrepresented at the extremes (top or bottom) of the entire ranked list L.” This list L can be anything, a vector of contrast statistics, or the normalized gene expression values of a sample/cell. After calculating an enrichment score, a test might be applied to determine its significance, but not all methods do it, for example GSVA or AUCell. Following GSEA definition, AUCell and VISION are indeed enrichment methods. I would personally remove scDECAF, unless it can be tied to a publication (can be a preprint), else it feels weird to include it, no? What do you think? Regarding limma’s testing, as mentioned above, if in the end we use GLMs in the DEG part we can explain that there. Regarding the self-contained sentence, I made a mistake. fry and rost can actually account for gene weights with their
View entire conversation on ReviewNB |
This is because DoRothEA and PROGENy are not methods, they are prior knowledge resources. Originally in their respective publications we introduced them as methods because indeed we distributed them with one, VIPER for DoRothEA and a normalized weighted sum (WSUM) for PROGENy so that they could be used for enrichment scoring. Recently, we have “decoupled” them into separate entities in decoupler, the enrichment methods from one side, and the gene sets/footprints on another. Here we could mention the actual methods, VIPER and WSUM, but they do not contribute to the main message, which is that bulk methods can be applied to single-cell. View entire conversation on ReviewNB |
See previous comment.
View entire conversation on ReviewNB |
See previous comment. View entire conversation on ReviewNB |
Hi @soroorh Sorry for the late reply! It's been a busy month. I modified the text and replied to your comments. If needed I'm happy to have chat via zoom or others. |
@PauBadiaM thank you! we'll get back to you sooooonish. Please be patient. |
there is still an open debate on whether DEG should be assessed by pseudo-bulk or single cell. The aim of the chapter is to present both views. I think in the header or text we explicitly mention what test is one cluster vs all other - this very much depends on the biological question and we are just using it for demonstration here. View entire conversation on ReviewNB |
I think we need to keep both pseudo-bulk and cell views as in the original work, and warn users by referencing works such as the one you suggested, which is already covered in DEG chapter i believe. I still insists on the scores not being interpretable in context of enrichment - you are showing dist of scores between stim and ctrl to show a difference i.e. enrichment between groups, for this DE tests and dot plot or even barcode plot are more appropriate. Both UMAP and violin are valid representations - you can't fit 100 violin plots for 100 genesets in MSigDB, nor UMAPs. My request is to keep this to one example only, and add more informative plots directed at the problem being tested. View entire conversation on ReviewNB |
as i mentioned, in the original version of this work we cover both pseudobulk and single cell views, each with their own appropriate contrast explaining what is being tested. In your version, this was completely distorted.
Regarding the potential low number of data points for each condition, this is already taken into account by the significance of the test statistic. In this simple example, it actually appears that this number of replicas is far enough to yield statistically significant outputs, as shown in the volcano plot. If there were not enough samples, there would be no significant hits.
I don't think this is a valid argument based on my academic training.
I think i have been trying to say 1- need both pseudo-bulk and single cell views, each with their own appropriate contrast and statement of limitations. 2- pb + limma-fry for complicated experimental designs. View entire conversation on ReviewNB |
pls see prev. reply. View entire conversation on ReviewNB |
so, if i get a non-symmetric volcano, i should not proceed with the DE? this is an important QC based on a personal practice. The best practice is to check dominant axes of variation by PCA of pbs, construct a design matrix or choose a DE testing procedure where these variations are modelled properly and run the appropriate test/contrast. View entire conversation on ReviewNB |
pls see pre reply View entire conversation on ReviewNB |
as above View entire conversation on ReviewNB |
please remove View entire conversation on ReviewNB |
please consider the suggestion View entire conversation on ReviewNB |
an activity score is a measure of absolute expression, enrichment is a measure of relative expression - this is where it makes a difference in interpretation. View entire conversation on ReviewNB |
hmm, i only saw that used in this context in a preprint by your group. The latent variables in expimap are linearly decoded, so they are actually disentangled representations of gene sets (the paper discusses how directions for example don't alway match). Still to me latent variable has a notion of non-linearity, or non-linear factorisation of the data. I am not confident to use them for gene set scores. you have argued for simplicity earlier, why not going with simple terms?
View entire conversation on ReviewNB |
I talked about relative vs absolute quantification, and this was explained in limma manual i believe too. AUCell explicitly call themselves Analysis of 'gene set' activity in single-cell RNA-seq data . For this chapter, i think it is important to make the distinction between enrichment(==overrrepresentation) relative to a condition and absolute quantification which is condition agnostic. I had explained this somewhere in the original work. please add to your version too/copy across.
View entire conversation on ReviewNB |
ah yes, and you can drop scDECAF. With me doing a PhD here and my collaborators moving out, i can have no good estimate of having a preprint out. View entire conversation on ReviewNB |
please do as suggested View entire conversation on ReviewNB |
Please do as suggested View entire conversation on ReviewNB |
fine! View entire conversation on ReviewNB |
@PauBadiaM thank you for your efforts and help with improving the chapter and apologies for the late reply.
I believe both of these views should be presented to the reader. Additionally, the distinction between enrichment, to asses over-representation or differential activity between conditions, versus absolute scoring needs to be clearly stated. The most relevant visualisations for this chapter are, practically, barplots or dot plots and barcode plots as seen commonly in all publications. The use of all other visualisations has to be limited. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your efforts and patience. Please consider the suggested amendments.
@PauBadiaM what's the plan with this PR? Will you find some time anytime soon to complete it? Thank you! |
Hi @soroorh, @ivirshup, @Zethson and @AnnaChristina,
As discussed in #119, I've updated and reorganized the gene set enrichment chapter.
In short, we start with an introduction to gene sets and commonly used databases, distinguish between gene sets and footprints, describe how functional enrichment can be applied to single cell data, list several enrichment methods and their different modeling strategies and discuss best practices. Then we conclude the chapter by showing the practical examples.
Some important changes:
Let me know what do you think about it, happy to go over it again to address comments.