Skip to content

For working with dimensions of arrays by name

License

Notifications You must be signed in to change notification settings

mcabbott/NamedDims.jl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NamedDims

Build Status Build Status Codecov

NamedDimsArray is a zero-cost abstraction to add names to the dimensions of an array.

Core functionality:

For nda = NamedDimsArray{(:x,:y,:z)}(rand(10,20,30)).

  • Unwrapping: parent(nda): returns the underlying AbstractArray that is wrapped by the NamedDimsArray
  • Indexing: nda[y=2]: the same as nda[x=:, y=2, z=:] which is the same as nda[:,2,:]
  • Functions taking a dims arg: sum(nda; dims=:y) is the same as sum(nda; dims=2)
  • Renaming: rename(nda, new_names) returns a new NamedDimsArray with the new_names but still wrapping the same data.

Dimensionally Safe Operations

Any operation of multiple NamedDimArrays must have compatible dimension names. For example trying NamedDimsArray{(:time,)}(ones(5)) + NamedDimsArray{(:place,)}(ones(5)) will throw an error. If you perform an operation between another AbstractArray and a NamedDimsArray, then the result will take its names from the NamedDimsArray. You can use this to bypass the protection, e.g. NamedDimsArray{(:time,)}(ones(5)) + parent(NamedDimsArray{(:place,)}(ones(5))) is allowed.

Partially Named Dimensions (:_)

To allow for arrays where only some dimensions have names, the name :_ is treated as a wildcard. Dimensions named with :_ will not be protected against operating between dimensions of different names; in these cases the result will take the name from the non-wildcard name, if any of the operands had such a concrete name. For example: NamedDimsArray{(:time,:_)}(ones(5,2)) + NamedDimsArray{(:_, :place,)}(ones(5,2)) is allowed. and would have a result of: NamedDimsArray{(:time,:place)}(2*ones(5,2)) As such, unless you want this wildcard behaviour, you should not use :_ as a dimension name. (Also that is a terrible dimension name, and goes against the whole point of this package.)

When you perform matrix multiplication between a AbstractArray and a NamedDimsArray then the new dimensions name is given as the wildcard :_. Similarly, when you take the transpose of a AbstractVector, the new first dimension is named :_.

Currently, if you have more than one wildcard dimension name, functionality for referring to dimensions by name will not work. See issue #8.

Usage

Writing functions that accept NamedDimsArrays or AbstractArrays

It is a common desire to be able to write code that anyone can call, whether they are using NamedDimsArrays or not. While also being able to use NamedDimsArrays internally in its definition; and also getting the assertion when a NamedDimsArray is passed in, that it has the expected dimensions. The way to do this is to call the NamedDimsArray constructor, with the expected names within the function. As in the following example:

function total_variance(data::AbstractMatrix)
    n_data = NamedDimsArray(data, (:times,:locations))
    location_variance = var(n_data; dims=:times)  # calculate variance at each location
    return sum(location_variance; dims=:locations)  # total them
end

If this function is given (say) a Matrix, then it will apply the names to it in n_data. Thus the function will just work on unnamed types. If data is a NamedDimsArray, with incompatible names an error will be thrown. For example if it data was mistakenly transposed and so had the dimension names: (:locations, :times) instead of (:times, :locations). If data was partially named, e.g. (:_, :locations), then that name would be allowed to be combined with the named from the constructor; yielding n_data with the expected names: (:times, :locations). This pattern allows both assertions of correctness (for named inputs), and convenience and compatibility (for unnamed input). And since NamedDimsArray is a zero-cost abstraction, this will basically compile out of existence, most of the time.

Extending support for more functions

There are two common things to do to make a function support NamedDimsArrays. These are:

  • Adding support for referring to a dimension by name to an existing function
  • Make the operation return a NamedDimsArray rather than a Array. (Many operations fallback to dropping the names) Often they are done together.

They are illustrated by the following example:

function foo(nda::NamedDimsArray, args...; dims=:)
    numerical_dims = dim(nda, dims)  # convert any form of dims into numerical dims
    raw_result = foo(parent(nda), args...; dims=numerical_dims)  # call it on the backed data
    new_names = determine_foo_names(nda, args...)  # workout what the new names will be
    return NamedDimsArray{new_names)(raw_result)  # wrap the result up
end

You can do this to your own functions in your own packages, to add NamedDimsArray support. If you implement it for any functions in a standard library, a PR would be very appreciated.

About

For working with dimensions of arrays by name

https://github.com/JuliaCollections/AxisArraysFuture/issues/1#issuecomment-482077891

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Julia 100.0%