Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering: Possible bug and a question #37

Open
therealgenna opened this issue Jun 7, 2021 · 0 comments
Open

Filtering: Possible bug and a question #37

therealgenna opened this issue Jun 7, 2021 · 0 comments

Comments

@therealgenna
Copy link

Hi,

(1) I think the paper Methods and the actual code disagree in calculating the Ms score. In the paper the text reads

... we calculate a score, Ms, for each putative microexon as Ms =1 − (1 − PsPU2)/n, where PU2 is the probability that the observed U2 score came from the Gaussian with the higher mean and n is the number of matches for a given intron.

The corresponding calculation in the code seems to be (from https://github.com/hemberg-lab/MicroExonator/blob/master/src/final_filters3.R, lines 85-95)

fit_U2_score <- normalmixEM(ME_matches_filter$U2_score, maxit = 10000, epsilon = 1e-05)
#ggplot_mix_comps(fit_U2_score, "Mixture model Micro-exon >=3 after coverge filter")
post.df <- as.data.frame(cbind(x = fit_U2_score$x, fit_U2_score$posterior))

#ME_final <- ME_centric_raw[ME %in% uniq_seq_filter & len_micro_exon_seq_found>=3, ]
if(fit_U2_score$mu[1]<=fit_U2_score$mu[2]){
  ME_final$ME_P_value <-  1 - (1 - approx(post.df$x, post.df$comp.1, ME_final$U2_scores)$y * ME_final$P_MEs) / ME_final$total_number_of_micro_exons_matches
} else {
  ME_final$ME_P_value <-  1 - (1 - approx(post.df$x, post.df$comp.2, ME_final$U2_scores)$y * ME_final$P_MEs)/ ME_final$total_number_of_micro_exons_matches
}

So if ME_final$ME_P_value should be identified with Ms the code calculates 1 − (1 − Ps(1-PU2))/n and not 1 − (1 − PsPU2)/n defined in the paper (because it takes the probability of the LOWER mean component, and not HIGHER mean.)

Or should Ms actually be 1 - ME_final$ME_P_value? In that case it will not match the paper either, I think ...

(2) In general, I am not sure I can understand the logic behind the expression for the Ms score in the paper. High scores are likely to indicate true microexons, and by definition Ms = 1 − (1 − PsPu2)/n = 1 - 1/n + PsPu2/n. Wouldn't you want Ms decrease with increasing Ps? The formula is the opposite. Also, if I understood correctly, n is the actual number of microexon+splice sites exact matches in the given intron sequence. I would expect Ms to grow for lower n (n>0), i.e., decrease with n increasing, But the formula is the opposite again. Shouldn't it be something like (1-Ps)Pu2/n? Or if you take it as 1 - ME_final$ME_P_value from the code it will be [1-(1-Pu2)Ps]/n, which also would make sense to me. Am I missing the point completely?

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant