Skip to content

Commit

Permalink
Deep __getitem__ in C++. (#8)
Browse files Browse the repository at this point in the history
C++ NumpyArray::getitem is done, setting the pattern for other classes (external C functions).

The Numba and Identity extensions are not done, which would be necessary to fully set the pattern.

* Start on PR7, deep __getitem__, for real this time.

* Added Content::minmax_depth, which will be used by Ellipsis.

* [skip ci] writing.

* Slices and RawArray compile.

* [skip ci] Update the README.

* [skip ci] What about this link?

* [skip ci] What about this link?

* [skip ci] Finalized README.

* Add a C++ test of RawArray so we can test this without the full cycle.

* [skip ci] Have to re-think Slices.

* Defined slices.

* Poking around, I decided that I should learn how to do slicing on NumpyArray, not RawArray (because NumpyArray supports multiple dimensions).

* To do the testing in Python, we'll need to convert all of the slice types to C++.

* Don't actually raise exceptions while testing.

* [skip ci] working on it.

* All Slice types should be readable now.

* Make syntax of test okay for Python 2.7.

* Make syntax of test okay for Python 2.7 (one more instance).

* Figure out what Windows uses as integer format strings.

* Intentionally fail so that we can see the format string explicitly on Windows.

* Should be fixed for Windows.

* Another test for Windows.

* It's safest to let pybind11 decide if it needs to cast.

* Give NumpyArray a nice representation for common numerical types.

* Started on the true getitem.

* Working on chaining getitems.

* [skip ci] Interaction between SliceAt and SliceStartStop is starting to work.

* Most of getitem has been implemented for SliceAt and SliceStartStop.

* Consolidated and cleaned up.

* But don't intentionally raise an Exception.

* Fix compilation issues on 32-bit Windows.

* [skip ci] carry is composable.

* SliceAt and SliceStartStop for no-carry and yes-carry, but there's a memory error to deal with.

* Fixed memory error (running off end of array).

* Fix Windows 32-bit compilation issues.

* All cases are working except two or more slices.

* [skip ci] working on it.

* [skip ci] Got this one case, but I should think more about it.

* [skip ci] Working out the logic of carrying in Python; good for depth <= 2 so far.

* [skip ci] more cases working.

* [skip ci] I need to rethink what 'carry' means.

* [skip ci] Got it for all levels of slice2.

* [skip ci] Working for some int indexes.

* [skip ci] All int and slice2 cases are working.

* [skip ci] Integer array indexing works, apart from iterating 'as one'.

* [skip ci] Maybe the 'as one' thing is working...

* [skip ci] Walking 'as one' through a slice.

* [skip ci] Have to think harder about how walking 'as one' interacts with slices.

* [skip ci] Stride-based getitem(slice2) works, and it seems to be simpler than the functional version.

* [skip ci] Intarray composes with strided slice2.

* [skip ci] Very nearly have intslice composing with slice2.

* [skip ci] Trouble is, we reshape a strided array differently from a compacted array.

* Start a new technique for __getitem__ and also check that Azure still works.

* But don't run deliberately broken code.

* [skip ci] set up to do array-first getitem

* [skip ci] seems to be a good machine for array indexing

* [skip ci] works for 0, 1, 2 slices

* [skip ci] works for 0, 1, 2, 3 slices

* [skip ci] and arrays work again

* [skip ci] working for some non-trivial strides (haven't checked negative strides yet)

* [skip ci] can't do uneven or negative strides with a carry; have to compact in these cases

* [skip ci] working on compaction

* [skip ci] correctly ingesting arrays and slicing still works; need to check compaction again

* [skip ci] compaction works with the new ctypes-based __init__

* [skip ci] integers were very easy to add: good sign!

* [skip ci] slice3 works

* [skip ci] passing down advanced

* [skip ci] working on int vs array policies

* [skip ci] those policies are very complicated

* [skip ci] Numpy has a strange rule for split advanced indexing that presupposes rectilinear structure and I won't support it.

* [skip ci] working on making integers and arrays by broadcasting

* [skip ci] Our scope will not include basic indexes between advanced indexes (Numpy's implementation relies on rectilinear structure).

* [skip ci] simplify code before introducing multidimensional integer-array index

* [skip ci] multidimensional integer-array indexes work

* [skip ci] we get boolean-array indexing by preprocessing

* [skip ci] All of the getitem_next_array cases are working.

* [skip ci] Earlier work on selecting by strides converted over and it works (negative strides, non-trivial input strides, etc.).

* [skip ci] same for integer

* [skip ci] same for newaxis and Ellipsis

* The getitem study is done; now to implement it!

* Removed all getitem and SliceItem from C++.

* Fix visibility warnings in MacOS (Cling is getting the wrong -fvisibility default?).

* And choose the new CMP0063 policy for that.

* [skip ci] save work

* [skip ci] writing up theory notes

* [skip ci] writing up theory notes

* [skip ci] writing up theory notes

* [skip ci] writing up theory notes

* [skip ci] writing up theory notes

* [skip ci] writing up theory notes

* [skip ci] writing up theory notes

* [skip ci] theory notes are done

* [skip ci] theory notes are done

* Rewriting all the Slices; compiles again.

* Rewriting all the Slices; toslice_part has to modify the Slice in place (because it needs to flatten boolarray -> intarrays).

* toslice and toslice_part compile

* toslice and toslice_part are successfully linked and can be called

* Reinstated Slice tests (in a way that can be permanent).

* SliceArray64 uses '...' for large arrays (shows no more than 6 elements at each level).

* Broadcasting works.

* Includes test against the case we're giving up on.

* Numpy::getitem is beginning to work: SliceAt by strides is done.

* Fix 32-bit compilation.

* [skip ci] working on slice bystrides

* [skip ci] slices might be working

* The new getitem is tested for integer and slice.

* Short-circuited NumpyArray::get and NumpyArray::slice through NumpyArray::getitem, but IDs will need to be propagated through it.

* Fix 32-bit Windows.

* The experimental getitem became standard __getitem__.

* NumpyArray::getitem_bystrides.

* Fix Python 2.7.

* Starting onto NumpyArray::contiguous.

* [skip ci] working on contiguous

* Contiguous is working.

* Stub of NumpyArray::getitem_next compiles.

* getitem_next null in progress

* One level of getitem_next is working.

* Two levels of getitem_next are working.

* Add boolean mask tests and fix integer size/signed warnings.

* Empty index array is not a special case.

* Slices in NumpyArray::getitem_next work.

* All cases in NumpyArray::getitem are covered.

* Start moving functions to cpu-kernels and fix Python 2.7.

* Moving more functions into cpu-kernels.

* Fixed a bug in contiguous.

* Moving more functions into cpu-kernels.

* Finished moving functions into cpu-kernels.

* Minimize '#include' scopes.
  • Loading branch information
jpivarski authored Sep 21, 2019
1 parent 73186a8 commit ecf83ca
Show file tree
Hide file tree
Showing 30 changed files with 2,471 additions and 169 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

# ...

############################################################# LaTeX

*.aux
*.log
_minted-*

############################################################# Doxygen

awkward1/signatures/*.xsd
Expand Down
6 changes: 4 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ else()
set(PYBIND11_CPP_STANDARD -std=c++11)
endif()

set(CMAKE_CXX_VISIBILITY_PRESET hidden)
cmake_policy(SET CMP0063 NEW)

file(READ "VERSION_INFO" VERSION_INFO)
string(STRIP ${VERSION_INFO} VERSION_INFO)
add_definitions(-DVERSION_INFO="${VERSION_INFO}")
Expand All @@ -36,15 +39,14 @@ add_library(awkward-cpu-kernels-objects OBJECT ${CPU_KERNEL_SOURCES})
set_property(TARGET awkward-cpu-kernels-objects PROPERTY POSITION_INDEPENDENT_CODE 1)
add_library(awkward-cpu-kernels-static STATIC $<TARGET_OBJECTS:awkward-cpu-kernels-objects>)
add_library(awkward-cpu-kernels SHARED $<TARGET_OBJECTS:awkward-cpu-kernels-objects>)
# addtest(test-dummy1 "tests/cpu-kernels/dummy1.cpp")

add_library(awkward-objects OBJECT ${LIBAWKWARD_SOURCES})
set_property(TARGET awkward-objects PROPERTY POSITION_INDEPENDENT_CODE 1)
add_library(awkward-static STATIC $<TARGET_OBJECTS:awkward-objects>)
add_library(awkward SHARED $<TARGET_OBJECTS:awkward-objects>)
target_link_libraries(awkward-static PRIVATE awkward-cpu-kernels-static)
target_link_libraries(awkward PRIVATE awkward-cpu-kernels-static)
# addtest(test-dummy2 "tests/libawkward/dummy2.cpp")
addtest(test-PR8-rawarray "tests/test_PR8_rawarray_and_slices.cpp")

pybind11_add_module(layout src/pyawkward.cpp)
target_link_libraries(layout PRIVATE awkward-static)
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,13 @@ The following features of awkward 0.x will be features of awkward 1.x.
## Status

* 2019-08-17: set up a build process for the four layers with continuous deployment to Linux, MacOS, and Windows wheels.
* 2019-08-22: created a basic ListOffsetArray in C++, exposed to Python with pybind11, and ensured correct memory management between Python's reference counts and C++'s `std::shared_ptr`.
* 2019-08-22 (PR [#2](../../pull/2)): created a basic `NumpyArray` and `ListOffsetArray` in C++, exposed to Python with pybind11, and ensured correct memory management between Python's reference counts and C++'s `std::shared_ptr`.
* 2019-08-26 (PR [#3](../../pull/3)): extended Numba so that `NumpyArray` and `ListOffsetArray` can be used in Numba-compiled functions, ensuring no memory leaks/double frees.
* 2019-08-27 (PR [#4](../../pull/4)): introduced `Identity`, an optional surrogate key whose use is illustrated in [PartiQL](https://github.com/jpivarski/PartiQL#readme).
* 2019-08-29 (PR [#5](../../pull/5)): extended Numba to use `Identity` as well, ensuring no memory leaks/double frees.
* 2019-08-30 (PR [#6](../../pull/6)): added iteration to both C++ and Numba, as well as the first "operation," `awkward1.tolist`, which turns an awkward array into Python lists (and eventually dicts, etc.).
* 2019-09-02 (PR [#7](../../pull/7)): refactored `Index`, `Identity`, and `ListOffsetArray` (and any other array types with `Index`, which is nearly all of them) to have a 32-bit and a 64-bit version. My original plan to only support 64-bit in "chunked arrays" with 32-bit everywhere else is hereby scrapped—both bit widths will be supported on all indexes. Non-native endian, non-trivial strides, and multidimensional `Index`/`Identity` are not supported, though all of these features are allowed for `NumpyArray` (which is _content_, not an _index_). The only limitation on `NumpyArray` is that data must be C-ordered, not Fortran-ordered. (TODO: enforce that!)

## Roadmap

**TODO.** Rough estimate: it will be in a testable state later this year, with the `awkward/awkward1``awkward0/awkward` transition early in 2020.
**TODO.** Rough estimate: it will be in a testable state later this year, possibly the beginning of October, with the `awkward/awkward1``awkward0/awkward` transition early in 2020.
7 changes: 6 additions & 1 deletion awkward1/operations/format.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
# BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

import numbers

import numpy

import awkward1.layout

def tolist(array):
if isinstance(array, numpy.ndarray):
if array is None or isinstance(array, (bool, str, bytes, numbers.Number)):
return array

elif isinstance(array, numpy.ndarray):
return array.tolist()

elif isinstance(array, awkward1.layout.NumpyArray):
Expand Down
Binary file added docs/theory/arrays-are-functions.pdf
Binary file not shown.
290 changes: 290 additions & 0 deletions docs/theory/arrays-are-functions.tex

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions include/awkward/Content.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
#define AWKWARD_CONTENT_H_

#include "awkward/cpu-kernels/util.h"
#include "awkward/util.h"
#include "awkward/Identity.h"

namespace awkward {
Expand All @@ -13,13 +12,14 @@ namespace awkward {
virtual const std::shared_ptr<Identity> id() const = 0;
virtual void setid() = 0;
virtual void setid(const std::shared_ptr<Identity> id) = 0;
virtual const std::string repr(const std::string indent, const std::string pre, const std::string post) const = 0;
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const = 0;
virtual int64_t length() const = 0;
virtual std::shared_ptr<Content> shallow_copy() const = 0;
virtual std::shared_ptr<Content> get(int64_t at) const = 0;
virtual std::shared_ptr<Content> slice(int64_t start, int64_t stop) const = 0;
virtual const std::shared_ptr<Content> shallow_copy() const = 0;
virtual const std::shared_ptr<Content> get(int64_t at) const = 0;
virtual const std::shared_ptr<Content> slice(int64_t start, int64_t stop) const = 0;
virtual const std::pair<int64_t, int64_t> minmax_depth() const = 0;

const std::string repr() const;
const std::string tostring() const;
};
}

Expand Down
16 changes: 5 additions & 11 deletions include/awkward/Identity.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,11 @@
#ifndef AWKWARD_IDENTITY_H_
#define AWKWARD_IDENTITY_H_

#include <cassert>
#include <atomic>
#include <iomanip>
#include <utility>
#include <string>
#include <vector>
#include <memory>
#include <sstream>
#include <type_traits>

#include "awkward/cpu-kernels/util.h"
#include "awkward/util.h"

namespace awkward {
class Identity {
Expand All @@ -23,6 +16,7 @@ namespace awkward {
typedef std::vector<std::pair<int64_t, std::string>> FieldLoc;

static Ref newref();
static std::shared_ptr<Identity> none() { return std::shared_ptr<Identity>(nullptr); }

Identity(const Ref ref, const FieldLoc fieldloc, int64_t offset, int64_t width, int64_t length)
: ref_(ref)
Expand All @@ -37,7 +31,7 @@ namespace awkward {
const int64_t width() const { return width_; }
const int64_t length() const { return length_; }

virtual const std::string repr(const std::string indent, const std::string pre, const std::string post) const = 0;
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const = 0;
virtual const std::shared_ptr<Identity> slice(int64_t start, int64_t stop) const = 0;
virtual const std::shared_ptr<Identity> shallow_copy() const = 0;

Expand All @@ -54,18 +48,18 @@ namespace awkward {
public:
IdentityOf<T>(const Ref ref, const FieldLoc fieldloc, int64_t width, int64_t length)
: Identity(ref, fieldloc, 0, width, length)
, ptr_(std::shared_ptr<T>(new T[length*width])) { }
, ptr_(std::shared_ptr<T>(new T[(size_t)(length*width)])) { }
IdentityOf<T>(const Ref ref, const FieldLoc fieldloc, int64_t offset, int64_t width, int64_t length, const std::shared_ptr<T> ptr)
: Identity(ref, fieldloc, offset, width, length)
, ptr_(ptr) { }

const std::shared_ptr<T> ptr() const { return ptr_; }

virtual const std::string repr(const std::string indent, const std::string pre, const std::string post) const;
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual const std::shared_ptr<Identity> slice(int64_t start, int64_t stop) const;
virtual const std::shared_ptr<Identity> shallow_copy() const;

const std::string repr() const;
const std::string tostring() const;
const std::vector<T> get(int64_t at) const;

private:
Expand Down
21 changes: 11 additions & 10 deletions include/awkward/Index.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,22 @@
#ifndef AWKWARD_INDEX_H_
#define AWKWARD_INDEX_H_

#include <cassert>
#include <iomanip>
#include <string>
#include <sstream>
#include <memory>
#include <type_traits>

#include "awkward/cpu-kernels/util.h"
#include "awkward/util.h"

namespace awkward {
class Index {
virtual const std::shared_ptr<Index> shallow_copy() const = 0;
};

template <typename T>
class IndexOf {
class IndexOf: public Index {
public:
IndexOf<T>(T length)
: ptr_(std::shared_ptr<T>(new T[length], awkward::util::array_deleter<T>()))
IndexOf<T>(int64_t length)
: ptr_(std::shared_ptr<T>(new T[(size_t)length], awkward::util::array_deleter<T>()))
, offset_(0)
, length_(length) { }
IndexOf<T>(const std::shared_ptr<T> ptr, int64_t offset, int64_t length)
Expand All @@ -30,18 +30,19 @@ namespace awkward {
int64_t offset() const { return offset_; }
int64_t length() const { return length_; }

const std::string repr() const;
const std::string repr(const std::string indent, const std::string pre, const std::string post) const;
const std::string tostring() const;
const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
T get(int64_t at) const;
IndexOf<T> slice(int64_t start, int64_t stop) const;
virtual const std::shared_ptr<Index> shallow_copy() const;

private:
const std::shared_ptr<T> ptr_;
const int64_t offset_;
const int64_t length_;
};

typedef IndexOf<int8_t> Index8;
typedef IndexOf<uint8_t> Index8;
typedef IndexOf<int32_t> Index32;
typedef IndexOf<int64_t> Index64;
}
Expand Down
5 changes: 2 additions & 3 deletions include/awkward/Iterator.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
#define AWKWARD_ITERATOR_H_

#include "awkward/cpu-kernels/util.h"
#include "awkward/util.h"
#include "awkward/Content.h"

namespace awkward {
Expand All @@ -20,8 +19,8 @@ namespace awkward {
const bool isdone() const { return where_ >= content_.get()->length(); }
const std::shared_ptr<Content> next() { return content_.get()->get(where_++); }

const std::string repr(const std::string indent, const std::string pre, const std::string post) const;
const std::string repr() const;
const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
const std::string tostring() const;

private:
const std::shared_ptr<Content> content_;
Expand Down
13 changes: 5 additions & 8 deletions include/awkward/ListOffsetArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,9 @@
#ifndef AWKWARD_LISTOFFSETARRAYCONTENT_H_
#define AWKWARD_LISTOFFSETARRAYCONTENT_H_

#include <sstream>
#include <memory>
#include <type_traits>

#include "awkward/cpu-kernels/util.h"
#include "awkward/cpu-kernels/identity.h"
#include "awkward/util.h"
#include "awkward/Index.h"
#include "awkward/Identity.h"
#include "awkward/Content.h"
Expand All @@ -29,11 +25,12 @@ namespace awkward {
virtual const std::shared_ptr<Identity> id() const { return id_; }
virtual void setid();
virtual void setid(const std::shared_ptr<Identity> id);
virtual const std::string repr(const std::string indent, const std::string pre, const std::string post) const;
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual int64_t length() const;
virtual std::shared_ptr<Content> shallow_copy() const;
virtual std::shared_ptr<Content> get(int64_t at) const;
virtual std::shared_ptr<Content> slice(int64_t start, int64_t stop) const;
virtual const std::shared_ptr<Content> shallow_copy() const;
virtual const std::shared_ptr<Content> get(int64_t at) const;
virtual const std::shared_ptr<Content> slice(int64_t start, int64_t stop) const;
virtual const std::pair<int64_t, int64_t> minmax_depth() const;

private:
std::shared_ptr<Identity> id_;
Expand Down
39 changes: 22 additions & 17 deletions include/awkward/NumpyArray.h
Original file line number Diff line number Diff line change
@@ -1,18 +1,15 @@
// BSD 3-Clause License; see https://github.com/jpivarski/awkward-1.0/blob/master/LICENSE

#ifndef AWKWARD_NUMPYARRAYINDEX_H_
#define AWKWARD_NUMPYARRAYINDEX_H_
#ifndef AWKWARD_NUMPYARRAY_H_
#define AWKWARD_NUMPYARRAY_H_

#include <cassert>
#include <vector>
#include <string>
#include <iomanip>
#include <sstream>
#include <memory>
#include <stdexcept>
#include <vector>

#include "awkward/cpu-kernels/util.h"
#include "awkward/util.h"
#include "awkward/Slice.h"
#include "awkward/Content.h"

namespace awkward {
Expand All @@ -39,29 +36,37 @@ namespace awkward {
ssize_t ndim() const;
bool isscalar() const;
bool isempty() const;
bool iscompact() const;
void* byteptr() const;
ssize_t bytelength() const;
uint8_t getbyte(ssize_t at) const;

virtual const std::shared_ptr<Identity> id() const { return id_; }
virtual void setid();
virtual void setid(const std::shared_ptr<Identity> id);
virtual const std::string repr(const std::string indent, const std::string pre, const std::string post) const;
virtual const std::string tostring_part(const std::string indent, const std::string pre, const std::string post) const;
virtual int64_t length() const;
virtual std::shared_ptr<Content> shallow_copy() const;
virtual std::shared_ptr<Content> get(int64_t at) const;
virtual std::shared_ptr<Content> slice(int64_t start, int64_t stop) const;
virtual const std::shared_ptr<Content> shallow_copy() const;
virtual const std::shared_ptr<Content> get(int64_t at) const;
virtual const std::shared_ptr<Content> slice(int64_t start, int64_t stop) const;
virtual const std::pair<int64_t, int64_t> minmax_depth() const;

bool iscontiguous() const;
void become_contiguous();
const NumpyArray contiguous() const;
const NumpyArray contiguous_next(Index64 bytepos) const;
const std::shared_ptr<Content> getitem(const Slice& slice) const;
const NumpyArray getitem_bystrides(const std::shared_ptr<SliceItem>& head, const Slice& tail, int64_t length) const;
const NumpyArray getitem_next(const std::shared_ptr<SliceItem> head, const Slice& tail, Index64& carry, Index64& advanced, int64_t length, int64_t stride) const;

private:
std::shared_ptr<Identity> id_;
const std::shared_ptr<void> ptr_;
const std::vector<ssize_t> shape_;
const std::vector<ssize_t> strides_;
const ssize_t byteoffset_;
std::shared_ptr<void> ptr_;
std::vector<ssize_t> shape_;
std::vector<ssize_t> strides_;
ssize_t byteoffset_;
const ssize_t itemsize_;
const std::string format_;
};
}

#endif // AWKWARD_NUMPYARRAYINDEX_H_
#endif // AWKWARD_NUMPYARRAY_H_
Loading

0 comments on commit ecf83ca

Please sign in to comment.