Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/class/structureddata suggestions #1

Open
wants to merge 84 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
3239487
Initial commit
dgketchum Feb 27, 2018
03905ee
initial commit spatial files
dgketchum Feb 27, 2018
657d0f9
initial commit from pycharm
dgketchum Feb 27, 2018
0ff1006
initial commit sample learning
dgketchum Feb 27, 2018
056d06d
git ignore changes to exclude pycharm files
dgketchum Feb 27, 2018
74617ce
initial commit raster extract
dgketchum Feb 27, 2018
0b8dbfd
package restructure
dgketchum Feb 27, 2018
33897d9
initial commit testing
dgketchum Feb 27, 2018
cb23796
initial commit FLU metadata
dgketchum Feb 27, 2018
7cbbc33
initial commit raster clip
dgketchum Feb 27, 2018
fa76fc4
added attributed shapefile
dgketchum Feb 27, 2018
e8a8fb9
added attributed shapefile
dgketchum Feb 27, 2018
d8118b2
passing tests.test compose. raster point extract
dgketchum Feb 27, 2018
4bd1ef6
passing tests.test compose. raster point extract
dgketchum Feb 28, 2018
7ec2a15
need to reproject hex centroids
dgketchum Feb 28, 2018
b78ace1
reprojected to Z11 utm centroids
dgketchum Feb 28, 2018
8fdefa0
reprojected to Z11 utm centroids
dgketchum Feb 28, 2018
1174cc0
reprojected to Z12 utm centroids
dgketchum Feb 28, 2018
7ad018e
working on band loop
dgketchum Feb 28, 2018
c6d37d8
added hex centroids Z12 attributed with L and I type
dgketchum Feb 28, 2018
7ca1e29
compose array running, inital commit nn and softmx
dgketchum Mar 1, 2018
a00866e
initial commit pickled sample data
dgketchum Mar 1, 2018
b2185eb
added binary_true option
dgketchum Mar 1, 2018
532e073
'object too deep for desired array' on final gradient cacl on softmax…
dgketchum Mar 1, 2018
116e8d4
considering dropna for Landsat7
dgketchum Mar 1, 2018
fe01c45
removed np.repeat on axis 1 in _softmax
dgketchum Mar 2, 2018
ba1739d
removed np.repeat on axis 1 in _softmax
dgketchum Mar 2, 2018
d43de4d
initial commit preprocessing scripts
dgketchum Mar 2, 2018
bc29d0d
initial commit nlcd map
dgketchum Mar 3, 2018
26a4c7f
going to add functionality to both extract from many polygon shapefil…
dgketchum Mar 3, 2018
f9173b8
adding point value from polygon extract function
dgketchum Mar 3, 2018
26705bf
reverting to simple extract
dgketchum Mar 5, 2018
cb17f47
getting back to more complex
dgketchum Mar 5, 2018
4bf3b72
getting back to more complex extract
dgketchum Mar 5, 2018
d7ef280
added StructuredData class
dgketchum Mar 6, 2018
1a2e5dc
added StructuredData class
dgketchum Mar 6, 2018
eaabc1c
working on tf_softmax
dgketchum Mar 6, 2018
bbe02ac
working on tf_softmax
dgketchum Mar 6, 2018
17b136f
feed dict problem in tf.session.run
dgketchum Mar 6, 2018
da22c6a
tensorflow softmax function working, 96.4% test accuracy
dgketchum Mar 6, 2018
6216461
getting back to more complex extract
dgketchum Mar 6, 2018
64b0e33
need to get total FLU in Z12
dgketchum Mar 7, 2018
058ba83
point from polygon extract work
dgketchum Mar 8, 2018
963aa4c
Merge remote-tracking branch 'origin/master'
dgketchum Mar 8, 2018
b4eea86
working on extract total FLU with nlcd-filled gaps
dgketchum Mar 8, 2018
b01cff6
adding nlcd replace to FLU classes
dgketchum Mar 8, 2018
2b7e618
added key and map, dataframe replace, return key
dgketchum Mar 9, 2018
248ac6d
p39r27 quarter test pickle initial commit
dgketchum Mar 9, 2018
1e6afb7
compose array working for nlcd and flu extract, ran 8500 features
dgketchum Mar 9, 2018
ed1bf2e
compose array to do 50000 extract from sun p39r27
dgketchum Mar 9, 2018
c1f4762
compose array to do 50000 extract from sun p39r27
dgketchum Mar 9, 2018
80c7fa3
compose array to do 50000 extract from sun p39r27
dgketchum Mar 10, 2018
cf22159
running tf_softmax using binary classifier
dgketchum Mar 12, 2018
8084030
running tf_softmax using binary classifier
dgketchum Mar 12, 2018
bc4eb1a
modified to run from command line; revert after!
dgketchum Mar 12, 2018
3444209
refactor pixel_classification package
dgketchum Mar 12, 2018
0eb83f4
refactoring neural net
dgketchum Mar 12, 2018
c1986b5
working on graph in nn
dgketchum Mar 12, 2018
e1db773
inital commit requirements
dgketchum Mar 13, 2018
90ebcfe
setup for long external run
dgketchum Mar 13, 2018
89351b8
still working on neural network
dgketchum Mar 13, 2018
1bf743c
Merge remote-tracking branch 'origin/master'
dgketchum Mar 13, 2018
416fb2e
tf neural net making it to minibatch accuracty calculation
dgketchum Mar 13, 2018
18fcafb
tf_neural_net running 98.4% accuracy on test set
dgketchum Mar 13, 2018
646bf51
refactoring tf_multilayer_perceptron running at 98.4% accuracy
dgketchum Mar 13, 2018
57ff98d
reverted to version without sys.path.append(...)
dgketchum Mar 13, 2018
d3640ba
running full extract on centroids
dgketchum Mar 13, 2018
683b0c2
Merge remote-tracking branch 'origin/master'
dgketchum Mar 13, 2018
7c2971d
more refactoring
dgketchum Mar 13, 2018
60eb673
update copyright
dgketchum Mar 13, 2018
b0f8059
working on StructuredObject.principal_components
dgketchum Mar 13, 2018
df1440d
Merge remote-tracking branch 'origin/master'
dgketchum Mar 14, 2018
b2abc88
commit new spatial shps
dgketchum Mar 14, 2018
a51ca45
nlcd clip commit
dgketchum Mar 14, 2018
c4ab115
second try nlcd test clip
dgketchum Mar 14, 2018
b2d630e
removed nlcd file
dgketchum Mar 14, 2018
c5139a6
Merge branch 'master' of https://github.com/dgketchum/IrrMapper
dgketchum Mar 14, 2018
5ce6244
pre-messing around with cache
dgketchum Mar 14, 2018
89f0eb9
Merge remote-tracking branch 'origin/master'
dgketchum Mar 14, 2018
3601057
passing compose array test
dgketchum Mar 14, 2018
48c669e
passing structuredata tests
dgketchum Mar 14, 2018
40eb20f
passing all tests
dgketchum Mar 14, 2018
b624126
test/class/structureddata suggestions
jirhiker Mar 15, 2018
9c1368b
better instantiation of StructuredData
jirhiker Mar 15, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pixel_classification/learn_sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ def classify(alg='softmax', data=None, path_to_pickled=None, binary=None):
'pickled dict of form:\n{}'.format(dct_form))

data = StructuredData(data)
pca = data.principal_components(return_percentile=95.)
data.make_binary(binary_true=binary, inplace=True)
data.principal_components(return_percentile=95.)

mapping = {'softmax': softmax,
'neural_net': mlp}
Expand Down
36 changes: 20 additions & 16 deletions pixel_prep/compose_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,7 @@ def load_irrigation_data(shapefile, rasters, pickle_path=None,
:return: numpy.ndarray
"""

df = point_target_extract(points=shapefile, nlcd_path=nlcd_path,
target_shapefile=target_shapefiles,
df = point_target_extract(points=shapefile, nlcd_path=nlcd_path, target_shapefile=target_shapefiles,
count_limit=count)

rasters = raster_paths(rasters)
Expand Down Expand Up @@ -100,8 +99,8 @@ def recursive_file_gen(mydir):
yield os.path.join(root, file)


def point_target_extract(points, nlcd_path, target_shapefile=None,
count_limit=None):
def point_target_extract(points, nlcd_path,
target_shapefile=None, count_limit=None):
point_data = {}
with fopen(points, 'r') as src:
for feature in src:
Expand Down Expand Up @@ -130,20 +129,25 @@ def point_target_extract(points, nlcd_path, target_shapefile=None,
break

if not has_attr:
with rasopen(nlcd_path, 'r') as rsrc:
rass_arr = rsrc.read()
rass_arr = rass_arr.reshape(rass_arr.shape[1], rass_arr.shape[2])
affine = rsrc.affine

x, y = val['coords']
col, row = ~affine * (x, y)
raster_val = rass_arr[int(row), int(col)]
if nlcd_path:
with rasopen(nlcd_path, 'r') as rsrc:
rass_arr = rsrc.read()
rass_arr = rass_arr.reshape(rass_arr.shape[1], rass_arr.shape[2])
affine = rsrc.affine

x, y = val['coords']
col, row = ~affine * (x, y)
raster_val = rass_arr[int(row), int(col)]
ltype_dct = {'IType': None,
'LType': str(raster_val)}
point_data[pt_id]['properties'] = ltype_dct
print('id {} has no FLU, '
'nlcd {}'.format(pt_id,
nlcd_value(ltype_dct['LType'])))
else:
ltype_dct = {'IType': None,
'LType': str(raster_val)}
'LType': None}
point_data[pt_id]['properties'] = ltype_dct
print('id {} has no FLU, '
'nlcd {}'.format(pt_id,
nlcd_value(ltype_dct['LType'])))

idd = []
ltype = []
Expand Down
82 changes: 56 additions & 26 deletions pixel_prep/prep_structured_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,48 +19,79 @@
import numpy as np
from pandas import get_dummies
from sklearn import decomposition
from sklearn.utils import Bunch


class StructuredData(object):
""" Structured data object for ML training, not unlike a sklearn.dataset.load_dataset object"""
""" Structured data object for ML training, based on sklearn.utils.Bunch object"""

# is passing in a dict required?
# especially for just two values
# also consider this
# what happens in this case
# data = {'foo'}
# sd =StructuredData(data)
# since 'data' and 'target_values' are required make them positional arguments

def __init__(self, data, target_values, binary=None):
"""

:param data: dict object like {'features': }

"""

def __init__(self, data):
self.lamda = None
self.v = None

self.data = data
# saving data to this object is not necessary.
# just use it.
# The data dict is never referenced outside of this method
# self.data = data

self.x = self.data['data'].astype(np.float32)
self.y_strs = self.data['target_values']
# self.x = self.data['data'].astype(np.float32)
# self.y_strs = self.data['target_values']
self.x = data.astype(np.float32)
self.y_strs = target_values

unique, self.y = np.unique(self.y_strs, return_inverse=True)

self.classes = unique
self.class_counts = {x: list(self.y_strs).count(x) for x in self.classes}
print('Class counts: {}'.format(self.class_counts))
self.class_map = dict(zip(list(unique), list(range(len(unique)))))

# self.class_map = dict(zip(list(unique), list(range(len(unique)))))
# this is more consice than above
self.class_map = {u: i for i, u in enumerate(unique)}

print('Class integer map: {}'.format(self.class_map))

if binary:
self.y[self.y_strs == binary] = 1
self.y[self.y_strs != binary] = 0
self.y_strs[self.y_strs != binary] = '{}{}'.format('N', binary)

self.one_hot = get_dummies(self.y).values

def make_binary(self, binary_true, inplace=False):
""" Use a key value that will equate to True (1), all others to 0."""
"""
:param binary_true:
:return:
"""
if inplace:
self.y[self.y_strs == binary_true] = 1
self.y[self.y_strs != binary_true] = 0
self.y_strs[self.y_strs != binary_true] = '{}{}'.format('N', binary_true)
unique, _ = np.unique(self.y_strs, return_inverse=True)
self.classes = unique
self.class_counts = {x: list(self.y_strs).count(x) for x in self.classes}
self.one_hot = get_dummies(self.y).values
else:
new = copy.deepcopy(self)
self.make_binary(binary_true, inplace=True)
return new
# # def make_binary(self, binary_true, inplace=False):
# def make_binary(self, binary_true):
# """ Use a key value that will equate to True (1), all others to 0."""
# """
# :param binary_true:
# :return:
# """
# self.y[self.y_strs == binary_true] = 1
# self.y[self.y_strs != binary_true] = 0
# self.y_strs[self.y_strs != binary_true] = '{}{}'.format('N', binary_true)
# unique, _ = np.unique(self.y_strs, return_inverse=True)
# self.classes = unique
# self.class_counts = {x: list(self.y_strs).count(x) for x in self.classes}
# self.one_hot = get_dummies(self.y).values
#
# # not advisable
# # else:
# # new = copy.deepcopy(self)
# # self.make_binary(binary_true, inplace=True)
# # return new

def principal_components(self, return_percentile=None, n_components=None):
""" Extract eigenvectors and eigenvalue, return desired PCAs""
Expand All @@ -73,12 +104,11 @@ def principal_components(self, return_percentile=None, n_components=None):
pca = decomposition.PCA(0.95, copy=True, whiten=False)
pca.fit(self.x)

print (np.cumsum(pca.explained_variance_ratio_))
print(np.cumsum(pca.explained_variance_ratio_))
return pca


if __name__ == '__main__':
home = os.path.expanduser('~')


# ========================= EOF ================================================================
File renamed without changes.
Binary file added tests/data/P39R27_Quarter_Test.pkl
Binary file not shown.
Binary file removed tests/data/extract_test_attributed_Z12.dbf
Binary file not shown.
1 change: 0 additions & 1 deletion tests/data/extract_test_attributed_Z12.prj

This file was deleted.

1 change: 0 additions & 1 deletion tests/data/extract_test_attributed_Z12.qpj

This file was deleted.

Binary file removed tests/data/extract_test_attributed_Z12.shp
Binary file not shown.
Binary file removed tests/data/extract_test_attributed_Z12.shx
Binary file not shown.
Binary file added tests/data/nlcd_clip_test.tif
Binary file not shown.
Binary file added tests/data/test.pkl
Binary file not shown.
58 changes: 16 additions & 42 deletions tests/test_compose_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,63 +14,37 @@
# limitations under the License.
# ===============================================================================

import os
import unittest

from fiona import open as fopen
from rasterio import open as rasopen
from pixel_prep.compose_array import load_irrigation_data


class TestPointExtract(unittest.TestCase):
def setUp(self):
self.shapefile = 'tests/data/extract_test_attributed_Z12.shp'
self.raster = 'tests/data/LE07_L1TP_039027_20130726_20160907_01_T1_B3_clip.tif'
self.shapefile = 'data/extract_no_attrs_z12.shp'
self.raster = 'data/LE07_clip_L1TP_039027_20130726_20160907_01_T1_B3.TIF'
self.nlcd = 'data/nlcd_clip_test.tif'
self.target_polys = 'data/flu_test_z12.shp'
if not os.path.isfile(self.shapefile):
raise ValueError('Path to shapefile is invalid')

def tearDown(self):
pass

def test_raster_extract_by_point(self):
def test_compose_array(self):
""" Test native pet rasters vs. xarray netcdf point extract.
:return:
"""

points = raster_point_extract(self.raster, self.shapefile)
points = load_irrigation_data(self.shapefile, self.raster,
nlcd_path=self.nlcd,
target_shapefiles=self.target_polys,
)

for key, val in points.items():
self.assertEqual(val['raster_val'], val['extract_value'])


# ----------------------------------ANCILLARY FUNCTIONS-----------------------

def raster_point_extract(raster, points):
""" Get point values from a pixel_prep.

:param raster: local_raster
:param points: Shapefile of points.
:return: Dict of coords, row/cols, and values of pixel_prep at that point.
"""
point_data = {}

with fopen(points, 'r') as src:
for feature in src:
name = feature['id']
proj_coords = feature['geometry']['coordinates']

point_data[name] = {'coords': proj_coords,
'label': feature['properties']['LType'],
'raster_val': int(feature['properties']['LE07_L1TP_'])}

with rasopen(raster, 'r') as rsrc:
rass_arr = rsrc.read()
rass_arr = rass_arr.reshape(rass_arr.shape[1], rass_arr.shape[2])
affine = rsrc.affine

for key, val in point_data.items():
x, y = val['coords']
col, row = ~affine * (x, y)
raster_val = rass_arr[int(row), int(col)]
val['extract_value'] = raster_val

return point_data
self.assertEqual(points['target_values'][0], ['I', 'I', 'I', 'F', 'I'][0])
self.assertEqual(points['data'][0], [63, 51, 54, 82, 0][0])
self.assertEqual(points['features'][0], '039027_T1')


if __name__ == '__main__':
Expand Down
67 changes: 67 additions & 0 deletions tests/test_structured_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# =============================================================================================
# Copyright 2018 dgketchum
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================================

import unittest
import pickle
from numpy import array, any
from pixel_prep.prep_structured_data import StructuredData


class StructuredDataTest(unittest.TestCase):
def setUp(self):
path_to_pickled = 'data/test.pkl'
with open(path_to_pickled, 'rb') as p:
data = pickle.load(p)

self.struct = StructuredData(data)

def test_data_instant(self):
# this assertion is really a test of the test which in general is not done
# self.assertIsInstance(self.struct, StructuredData)
self.assertEquals(self.struct.class_counts['I'], 4)

def test_class_zero(self):
classes = self.struct.classes
self.assertEquals(classes[0], 'F')
# i try to only do one assert per test
# self.assertAlmostEqual(pca.mean_[0], 50.)

def test_data_pca_mean(self):
pca = self.struct.principal_components(n_components=1)
self.assertAlmostEqual(pca.mean_[0], 50.)

def test_data_binary(self):
self.struct.make_binary('I', inplace=True)

# use self.assert...?
# assert (self.struct.one_hot == array([[0, 1],
# [0, 1],
# [0, 1],
# [1, 0],
# [0, 1], ])).any()
expected = array([[0, 1],
[0, 1],
[0, 1],
[1, 0],
[0, 1]])

self.assertListEqual(list(self.struct.one_hot), expected)


if __name__ == '__main__':
unittest.main()

# ========================= EOF ====================================================================