-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pearson.dist with flat spectra #344
Comments
By saying "flat spectrum", do you mean a spectrum in which all intensities are the same, i.e., constant? |
yes |
|
Here (https://rdrr.io/cran/rdist/man/rdist.html) also a bit different formula: |
I feel that I do not have enough competence on this distance yet. |
@GegznaV I sent you a document on Slack that I have found very helpful. |
Some thoughts: If we think that a value should be returned instead of We could have an argument for options:
In the case of the And what value should be issued for two flat spectra? Is 0.5 reasonable as well in this scenario? Or should it be 1 as the shape of two spectra is identical and this kind of measure measures similarity of shape? |
I think it is unwise to do anything other than what the functions naturally return ( |
I agree with the idea that the original algorithm shouldn't be touched if it is widely accepted to use the algorithm in that form and that is the user's responsibility to fix his/her data as we cannot think of all boundary conditions and in some situations, our "shortcut solution" can be even more unexpected. Yet, we can create a section in the documentation on how to deal with situations like these and illustrate the situation with an example. |
@cbeleites, What should we do with this issue? Is it OK to leave NA, when the algorithm issues NA? |
Yes, but I've been looking into the paper you linked and realized that "Pearson distance" is quite ambiguous. I'd like to sort this out for the release. I can open a new branch for this, though. |
When calculating
pearson.dist()
(which is basically a scaled correlation between rows/spectra) with a perfectly flat spectrum, the result isNaN
.This is caused by the standardization of the data matrix: the variance within the flat spectrum is 0, so a division by 0 occurs.
cor(x, y)
which returnsNA
in this case.Besides allowing smoothly to work with flat spectra and
pearson.dist()
, this would allow users to distinguish Pearson distance to a flat spectrum from situations where e.g.NA
s in the spectra cause the distance to beNA
.Opinions?
The text was updated successfully, but these errors were encountered: