Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to do permutations that are assignments of n observations to k groups #5

Open
gavinsimpson opened this issue May 13, 2015 · 4 comments
Assignees

Comments

@gavinsimpson
Copy link
Owner

This is a different problem than tackled by permute currently and it pertains to the Golden Jackals example, where even though there are only 184,756 useful permutations, numPerms() will report a far higher value as we are just randomly shuffling the data with respect to the grouping variable. With sufficient data this shouldn't be a problem and potential duplicate permutations are unlikely to crop up often. However it would be useful to include this as a choice.

@jarioksa
Copy link

This is related to vegan issue vegandevs/vegan#132: with class variables, several unique permutations replicate the original classes (= were permuted within the same classes). A consequence of this is that minimum possible P-value is higher than 1/(nperm+1) because some permutations necessarily replicate the original allocation. However, with unequal class sizes probabilities of shuffling within one class level varies among levels. Random permutation disregarding any classification takes care of unequal classification probabilities and also correctly shows the effect in minimum possible P-value.

@gavinsimpson
Copy link
Owner Author

Seems like a suitable/correct algorithm is described on CrossValidated which we can implement, and combn() probably gives what we need for allPerms().

Now just to implement it and think about how to expose it given the current interface...

@gavinsimpson gavinsimpson self-assigned this Jan 9, 2019
@jarioksa
Copy link

jarioksa commented Jan 10, 2019

I don't think this CrossValidated question answers the same problem. It tells you how to do sample(n,k) when k < n, but this does not guarantee unique groups. Moreover, R already has sample(n,k).

Assume we have six observations with factor values A,B,B,C,C,C. We have 6! = 720 permutations for six observations, but only 6!/2!/3! = 60 different combinations of these three values (A,B,C).

The distinct sequences are easily exhausted only in small data sets, but there they can be disturbing. Here a function to estimate the number of distinct sequences of vector a (presumably a factor):

ndistseq <- function(a) exp(lfactorial(length(a)) - sum(lfactorial(table(a))))

@gavinsimpson
Copy link
Owner Author

Hmm, I need to revisit my thinking then; when I was playing with this for a two group example it was doing what we needed, but perhaps that was due to the simplicity of the example I was working with...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants