Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

231 establish table of active NPIs according to Corona Datenplattform and reported infection data #443

Merged
merged 110 commits into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from 105 commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
545ae4b
transformation of NPI data
mknaranja Mar 28, 2022
3115bf3
directory for raw data changed; new data separated by comma
mknaranja Mar 28, 2022
a485663
auxiliary files were read twice
mknaranja Mar 31, 2022
ed9e10e
small redesign and WIP of evaluation against confirmed infections
mknaranja Apr 5, 2022
888d524
small redesign and further WIP on NPI activation
mknaranja Apr 6, 2022
dc2e9b2
Compute incidence per county
mknaranja Apr 7, 2022
31e8aca
Activation of NPIs implmented; without deactivation by stronger NPIs;…
mknaranja Apr 7, 2022
f31ba8a
Compute incidence value, aggregate NPIs, use delay possible
mknaranja Apr 8, 2022
c25384f
further validation and minor items
mknaranja Apr 22, 2022
a636221
outsourcing of validation to function
mknaranja Jun 7, 2022
09dc291
Extension and correction for incidence-independent cats; WIP
mknaranja Jun 8, 2022
a2122a1
Validation for fine=1 completed
mknaranja Jun 9, 2022
8a8322a
remove placeholder categories
mknaranja Jun 9, 2022
9ca0041
outsourcing of analysis and plot to WIP function; correction if inden…
mknaranja Jun 9, 2022
57d4c69
Start of NPI combination matrix use; WIP
mknaranja Jun 9, 2022
b6b3379
Read combination matrix and export to clean format
mknaranja Jun 10, 2022
4d1e4f7
Make combination matrix dataframe
mknaranja Jun 10, 2022
4c0c8ab
todo combination matrix
mknaranja Jun 24, 2022
5664fc7
todo combination matrix
mknaranja Jun 24, 2022
4a12894
incidence_thresholds_to_npis now also contains subcode end, not only …
mknaranja Jul 11, 2022
85a29db
exclusion loops now started
mknaranja Jul 12, 2022
868d68d
if clause for check
mknaranja Jul 14, 2022
b255c30
Deactivation of exclusive NPIs
mknaranja Aug 14, 2022
df1480f
Correction and extension for delayed NPI implementation or lifting
mknaranja Aug 16, 2022
977df04
Further correction of NPI start and end date for delays with extensiv…
mknaranja Aug 18, 2022
da29e98
correct formatting; no other change
mknaranja Aug 18, 2022
7f11dbf
remove printouts and replace deprecated pandas append
mknaranja Aug 18, 2022
671e9a9
fix errors
patricklnz Oct 17, 2022
65596e7
rework activation/lifting of incidence dependent NPIs
patricklnz Nov 4, 2022
e024321
Add compareNPIData.py
Nov 21, 2022
96d1644
.
mknaranja Nov 21, 2022
5382c3d
compare data and use & sign
mknaranja Nov 21, 2022
162f442
sign corrected
mknaranja Nov 21, 2022
eb3b5a9
testing
mknaranja Nov 21, 2022
4a46b1f
performance improvement
patricklnz Nov 24, 2022
3d3e4e0
fix error
patricklnz Jan 9, 2023
309b5c1
create df_local_new directly from df_local_old
patricklnz Jan 9, 2023
b47e3b2
remove analysis from tronsformNPIData file (moved to Issue #444)
patricklnz Jan 9, 2023
ebe238d
rename transformNPIData -> getNPIData
patricklnz Jan 9, 2023
5327073
restructure and add first npi tests
patricklnz Jan 10, 2023
505e7d3
add more tests
patricklnz Jan 10, 2023
1453418
correction for activation and comments
mknaranja Jan 11, 2023
7f0eff8
some corrections and test
mknaranja Jan 11, 2023
ceec58c
write csv
mknaranja Jan 15, 2023
2745601
Add test for full functionallity with small test sets
patricklnz Jan 23, 2023
039dc11
Fix merge conflicts
annawendler Jan 30, 2023
169d5b4
correction of tests
mknaranja Jan 30, 2023
b3f7eaf
Correct for wrongly computed incidence if cases data starts with valu…
mknaranja Jan 30, 2023
e507e40
change test comments
mknaranja Jan 31, 2023
6508fbc
Read new data for subcategories and compare with old data
annawendler Jan 31, 2023
008849b
Merge branch '231-npi-data' of https://github.com/DLR-SC/memilio into…
annawendler Jan 31, 2023
61d69d4
correct tests
patricklnz Jan 31, 2023
16fdf0a
improve check of removed counties + handle futurewarning
patricklnz Jan 31, 2023
327e36a
test exclusion of npis
patricklnz Jan 31, 2023
1e77caa
suggested change
patricklnz Jan 31, 2023
f7ba4ed
Adapt documentation of activate_npis
annawendler Feb 1, 2023
141c629
exclusion and reduction of contradictory NPIs
mknaranja Feb 2, 2023
6d5220d
deactivation of contradictory NPIs
mknaranja Feb 2, 2023
e555d7a
strictness and correction combis
mknaranja Feb 3, 2023
5fc47ca
code length reduc
mknaranja Feb 3, 2023
8fc4c5e
rework strictness
mknaranja Feb 3, 2023
ede2917
fix test
patricklnz Feb 21, 2023
4145f63
remove debugging artifact
patricklnz Feb 21, 2023
1aa1865
rework strictness deactivation
patricklnz Feb 27, 2023
f5d6ffe
remove warning
patricklnz Feb 28, 2023
62fc0fb
adapt test to new deactivation method
patricklnz Mar 7, 2023
d8dc328
Count joined codes
patricklnz Mar 27, 2023
797f147
review changes
patricklnz Mar 30, 2023
1447c36
count multiple codes incidence dependent, fix combination matrix
patricklnz Apr 4, 2023
2ac1888
Adjust plotting
annawendler Apr 4, 2023
bc25331
merge
annawendler Apr 4, 2023
9c614d1
fix last commit
annawendler Apr 4, 2023
5f9f645
count joined codes for incid_depend
annawendler Apr 6, 2023
c61f62a
fix OSError
patricklnz Apr 18, 2023
c0df479
mock plot function
patricklnz Apr 18, 2023
ecc7272
fix counter
patricklnz Apr 20, 2023
27df4fb
counter for active codes after all (de-)activation
patricklnz Apr 21, 2023
d975ca6
adjust counter
patricklnz Apr 25, 2023
478d8fb
refactoring of count_codes with some small naming improvements and to…
mknaranja Apr 28, 2023
11f1388
adjust plotting
patricklnz May 8, 2023
1c43290
plot diag
patricklnz May 8, 2023
c4608c7
merge count functions
patricklnz May 8, 2023
fe5df64
review suggestions
patricklnz May 8, 2023
99f2d41
fix some errors
patricklnz May 9, 2023
9f3be26
don't plot diag
patricklnz May 22, 2023
1c25329
count plot_counter calls in test
patricklnz May 22, 2023
8275e08
adjust plotting
patricklnz Jun 13, 2023
03bc9e7
Merge branch 'main' into 231-npi-data
patricklnz Jul 17, 2023
cfeab1b
correct and improve comment example
mknaranja Aug 8, 2023
4eccc02
comments improved
mknaranja Aug 8, 2023
cb4a611
real removal of column zero and adapting npi combi stuff
mknaranja Aug 9, 2023
883f67d
minimal changes
mknaranja Aug 10, 2023
eeb9288
comments
mknaranja Aug 14, 2023
8740035
plotting and saving
mknaranja Aug 15, 2023
1e94e1c
combine functions
mknaranja Aug 16, 2023
08cf073
more comments and checks
mknaranja Aug 18, 2023
e4242ed
comments and slight design adjustments
mknaranja Aug 29, 2023
ab5e153
renaming
mknaranja Sep 1, 2023
40a9059
count code functions reduced
mknaranja Sep 1, 2023
45a7ff7
consistency for copies
annawendler Sep 4, 2023
db2fb9c
sanity checks and exception handling
patricklnz Sep 4, 2023
2a8e035
Merge branch '231-npi-data' of https://github.com/DLR-SC/memilio into…
patricklnz Sep 4, 2023
ae4618e
fix test
patricklnz Sep 4, 2023
1f770d9
adjust sanity check, remove one deepcopy
annawendler Sep 4, 2023
4adad52
fix bracket
annawendler Sep 4, 2023
121e65d
start date default
mknaranja Sep 4, 2023
ef36bfd
Merge branch 'main' into 231-npi-data
mknaranja Sep 4, 2023
efdb30d
change start_comb_matrix, fix npi_sanity_check,
annawendler Sep 5, 2023
112f7b1
pre commit
patricklnz Sep 5, 2023
4861484
Merge branch '231-npi-data' of https://github.com/DLR-SC/memilio into…
patricklnz Sep 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions pycode/memilio-epidata/memilio/epidata/compareNPIData.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
import os
import csv
import pandas as pd
import numpy as np

from memilio.epidata import getDataIntoPandasDataFrame as gd
from memilio.epidata import defaultDict as dd

directory = '/home/wend_aa/memilio/data/pydata/Germany'

#############################################################################################################
# read old data for subcategories

df_npis_old_data = pd.read_csv(
os.path.join(directory, 'kr_massnahmen_unterkategorien.csv'),
sep=',') # , nrows=numberofcities*1248

df_npis_old_data.rename(dd.GerEng, axis=1, inplace=True)

#############################################################################################################
# read new data for subcategories

codelist = ['m01a', 'm01b', 'm02a', 'm02b', 'm03', 'm04', 'm05', 'm06', 'm07', 'm08', 'm09',
'm10', 'm11', 'm12', 'm13', 'm14', 'm15', 'm16', 'm17', 'm18', 'm19', 'm20', 'm21']
counter_codes = 0
for code in codelist:
print(code)
df_npis_per_code = pd.read_csv(
os.path.join(directory,
f'kr_massn_unterkat_{code}.csv'),
sep=',')

# set some parameters for dataframe
if counter_codes == 0:
counties = np.sort(df_npis_per_code.ags5.unique())
num_counties = len(df_npis_per_code.ags5.unique())

# extract dates from data
dates = df_npis_per_code.iloc[:int(
df_npis_per_code.shape[0]/num_counties), 5]
# rename dates so that they match dates from other npi dataframe
dates_new = ['d' + date.replace('-', '') for date in dates]

df_local = [pd.DataFrame() for i in range(num_counties)]

# set df for all counties
for i in range(0, num_counties):
print(i)
if counter_codes == 0:
df_local[i] = pd.DataFrame(columns=list(
df_npis_per_code.columns[0:5]) + ['code'] + dates_new)

dummy_to_append = pd.DataFrame(columns=[
'code'] + dates_new, data=df_npis_per_code[df_npis_per_code.ags5 == counties[i]].iloc[:, 6:].T.reset_index().values.copy())

df_local[i] = pd.concat([df_local[i], dummy_to_append])

if df_npis_per_code.iloc[i*len(dates):(i+1)*len(dates), 3].nunique() > 1:
raise gd.DataError('Dates are not sorted as expected.')

# Set first five columns so that they match old format of data frame (from kr_massnahmen_unterkategorien.csv)
if counter_codes == len(codelist)-1:
df_local[i].iloc[:, 0:5] = df_npis_per_code.iloc[i *
len(dates), 0:5].values

counter_codes += 1

df_npis_new_data = pd.concat([df_local[i] for i in range(num_counties)])
df_npis_new_data.rename(dd.GerEng, axis=1, inplace=True)
df_npis_new_data['NPI_code'] = df_npis_new_data['NPI_code'].str.replace(
'code_m', 'M')


#############################################################################################################
# compare dataframes

# check if all rows for code M22, M23 and M24 in df_npis_old_data are empty
codesnotused = ((df_npis_old_data[df_npis_old_data["NPI_code"].str.contains(
"M22|M23|M24")].iloc[:, 6:] == -99).all() == True).all()
if codesnotused == True:
print("Codes M22, M23 and M24 are not used in old data (as expected).")
else:
print("Something wrong with data.")

# remove rows for codes M22, M23 and M24 from df_npis_old_data
df_npis_old_data = df_npis_old_data[~df_npis_old_data["NPI_code"].str.contains(
"M22|M23|M24")].copy()

# check how many days are covered in each dataframe and adjust accordingly so that both dataframes have same size
# we already know that df_npis_new_data has more columns than df_npis_old_data
df_npis_new_data = df_npis_new_data.iloc[:, :len(df_npis_old_data.columns)]

# assert if frames are equal (except index and column '_id')

if (pd.testing.assert_frame_equal(df_npis_old_data.iloc[:, 1:].reset_index(drop=True), df_npis_new_data.iloc[:, 1:].reset_index(drop=True), check_dtype=False) == None):
print('Data frames are equal.')
else:
print('Data frames are not equal.')
5 changes: 4 additions & 1 deletion pycode/memilio-epidata/memilio/epidata/defaultDict.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
'read_data': False,
'make_plot': False,
'out_folder': default_file_path,
'update_data': False,
'start_date': date(2020, 4, 24),
mknaranja marked this conversation as resolved.
Show resolved Hide resolved
'end_date': date.today(),
'split_berlin': False,
Expand Down Expand Up @@ -103,7 +104,9 @@
'nuts3': 'NUTS3',
'total_volume': 'Unique_trips',
'region_name': 'County',
'region_id': 'ID_County'
'region_id': 'ID_County',
'desc': 'Description',
'incidence': 'Incidence'
}

GerEng = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -404,9 +404,10 @@ def write_dataframe(df, directory, file_prefix, file_type, param_dict={}):
- json
- json_timeasstring [Default]
- hdf5
- csv
- txt
The file_type defines the file format and thus also the file ending.
The file format can be json, hdf5 or txt.
The file format can be json, hdf5, csv or txt.
For this option the column Date is converted from datetime to string.

@param df pandas dataframe (pandas DataFrame)
Expand All @@ -420,6 +421,7 @@ def write_dataframe(df, directory, file_prefix, file_type, param_dict={}):
outForm = {'json': [".json", {"orient": "records"}],
'json_timeasstring': [".json", {"orient": "records"}],
'hdf5': [".h5", {"key": "data"}],
'csv': [".csv", {}],
'txt': [".txt", param_dict]}

try:
Expand All @@ -428,7 +430,7 @@ def write_dataframe(df, directory, file_prefix, file_type, param_dict={}):
except KeyError:
raise ValueError(
"Error: The file format: " + file_type +
" does not exist. Use json, json_timeasstring, hdf5 or txt.")
" does not exist. Use json, json_timeasstring, hdf5, csv or txt.")

out_path = os.path.join(directory, file_prefix + outFormEnd)

Expand All @@ -441,6 +443,8 @@ def write_dataframe(df, directory, file_prefix, file_type, param_dict={}):
df.to_json(out_path, **outFormSpec)
elif file_type == "hdf5":
df.to_hdf(out_path, **outFormSpec)
elif file_type == 'csv':
df.to_csv(out_path)
elif file_type == "txt":
mknaranja marked this conversation as resolved.
Show resolved Hide resolved
df.to_csv(out_path, **outFormSpec)

Expand Down
Loading