Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rev on DataFrame does not reverse column order, as it does on data.frame #68

Open
malcook opened this issue Jun 18, 2020 · 4 comments
Open

Comments

@malcook
Copy link

malcook commented Jun 18, 2020

Perhaps I should infer this from the documentation.

Or it is natural consequence from underlying implementation.

Or perhaps it is an unintentional behavior.

Can I get a witness?

Is there a contract? Might this change?

Thanks!

@hpages
Copy link
Contributor

hpages commented Jun 18, 2020

Yes, a consequence of underlying implementation. More precisely, it's a consequence of the rev() method for Vector derivatives doing extractROWS(x, rev(seq_len(NROW(x)))). If we wanted rev() to reverse column order, we'd need to override this with a method for DataFrame derivatives. I don't mind either way. What others think? @lawremi @LTLA

2 things though:

  • DataFrame does not aim at 100% consistent behavior with data.frame. Some operations purposely diverge from data.frame (e.g. ==, order, sort). Don't know if rev() is one of them though.
  • Changing rev() now has the potential to break a few things around.

@malcook
Copy link
Author

malcook commented Jun 18, 2020

For me, most important is having a correct mental model so I don't make invalid assumptions. Possibly improving documentation somehow would serve, but I'm unsure how best to teach/explain where the analogy to data.frame should not be expected to hold.

@LTLA
Copy link
Contributor

LTLA commented Jun 19, 2020

IMO the current system provides intuitive and useful behavior for sort() and match(), and I don't think rev() should be an exception. It makes a lot of sense to consider the rows of a DataFrame as the individual elements for vector-like semantics, we typically want to interrogate it row-wise rather than column-wise.

In terms of documentation, I wonder whether this row-based interpretation of a DataFrame should be the conceptual default, with list-like behaviors being the deviation. Off the top of my head, there's [[, mcols and length that exhibit list-like characteristics. The practical implication of this choice would be that all other methods would be expected to follow the row-based interpretation, and those that don't use extractROWS will probably not be compliant.

@malcook
Copy link
Author

malcook commented Jun 19, 2020

I generally agree with your intuitions and even recall in the past being surprised by some of the behavior of data.frame. But I got used to it...

The DataFrame documentation begins:

 On the whole, the DataFrame behaves very similarly to
 data.frame, in terms of construction, subsetting, splitting,
 combining, etc. The most notable exception is that the row names
 are optional.

Perhaps a section detailing other (perhaps less notable) exceptions might be in order, along the lines of:

Other exceptions that are a consequence of DataFrame extending S4Vector include:

  • rev - reverses the row order, not the column order.
  • ... (c.f. S4Vectors:Vector-class)

However, reviewing S4Vectors:Vector-class documentation now, I find aspects of to not fully clarify. For example:

‘rev(x)’: Return a new Vector object made of the original elements in
the reverse order.
‘length(x)’: Get the number of elements in ‘x’.

... but length(DF) tells me the number of columns in the DataFrame and rev(DF) reverses the rows of the DF.

The point is that referring to the documentation of the underlying implementation does not fully clarify, at least for me, what I might expect the behavior of rev to be.

Am I missing some clarifying principle?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants