Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create utilities module which is neccessary to fix #104 #115

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

rluedde
Copy link
Contributor

@rluedde rluedde commented Jul 13, 2020

Fixes #104.

utilities.py module had to be created to get away from a circular import situation. This way, _coerce can be used in both remote.py and products.py

I'd like to be able to test these changes to make sure that I didn't blow anything up. How should I go about doing this? Run the test suite? Write some unit tests?

rluedde added 3 commits July 13, 2020 06:47
There were functions in products.py that will be useful in other modules
but circular imports pop up if these functions don't have their own module.
_ACS_MISSING only gets used once (in a function that's now in utilities.py
so I moved this variable into that function. I don't think that these
missing values ever change
If one of the columns can' be converted, none of the columns get converted.
@rluedde rluedde changed the title Numeric cols #104 Create utilities module which is neccessary to fix #104 Jul 13, 2020
reilley and others added 2 commits July 16, 2020 07:09
utilities._coerce used to take in an object and try to change the
entire object's dtype. Now, it changes only what columns can be
changeable.

For example, if there's a column of words and column of integers,
coerce cam now cast only the column of integers to type integer
and leave the column of words as pandas objects.

Add unit tests for coerce.
@rluedde
Copy link
Contributor Author

rluedde commented Jul 16, 2020

For all of the utilities, should they be imported specifically wherever they are used? from utilities import _coerce as coerce for example. Or should the utilities module be imported generally (import utilities) and then each function call have a utilities. prepended to it?

@dfolch
Copy link
Contributor

dfolch commented Jul 16, 2020

Most of cenpy uses the from ___ import ___ or from ___ import ___ as ___ style for internal imports. Probably best to follow that convention. Also check out the reply from @ljwolf in #104 regarding underscores and imports.

rluedde added 2 commits July 16, 2020 16:05
replace_missing (the argument) had the same name as replace_missing
(the function)
@rluedde
Copy link
Contributor Author

rluedde commented Jul 18, 2020

Fixes #104:

>>> import cenpy as cen
>>> api_conn = cen.remote.APIConnection('ACSDT5Y2018')  
>>> data = api_conn.query(['B01003_001E'], geo_unit='tract', geo_filter={'state':'04', 'county':'005'}) 
>>> data.B01003_001E.dtype
dtype('int64')

Addresses #114 . utilities._coerce can now handle pandas.DataFrames and pandas.Series. TypeErrors are raised if another data-format is used :numpy.ndarrays, lists, etc. Let me know if it shouldn't be this way.

Adds tests for utilities._coerce and utilities._replace_missing. For completeness' sake, would it be a good idea to add tests for the remaining three utilities functions (_break_ties, _fuzzy_match and _can_int)?

I think this PR is ready for review!

This fix likely causes a lot of other issues with functions in this module.
Tests will probably be a good idea at some point here.
@@ -1,7 +1,10 @@
from .utilities import _replace_missing as replace_missing_func
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these private functions being made public here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They shouldn't be. I'll fix that in a few.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this in my most recent commit.

@rluedde
Copy link
Contributor Author

rluedde commented Aug 16, 2020

I goofed here. That merge commit shouldn't have been done and I don't know how to fix it without doing damage.

While it does incorporate functionality that is useful, it's not within the scope of this PR. How do I remove that merge commit? It looks like it's possible to remove the most recent commit but I haven't found anything about removing non-most-recent commits.

@jGaboardi
Copy link
Member

I goofed here. That merge commit shouldn't have been done and I don't know how to fix it without doing damage.

While it does incorporate functionality that is useful, it's not within the scope of this PR. How do I remove that merge commit? It looks like it's possible to remove the most recent commit but I haven't found anything about removing non-most-recent commits.

See here for $ git revert <COMMIT_HASH>. In your case I think you mean commit 3489b78 so it should be:

$ git revert 3489b78

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data columns not returned as numeric
4 participants