Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle analysis of categorical concepts #32

Merged
merged 24 commits into from
Aug 22, 2024
Merged
Changes from 5 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
34b722f
add analyse_categorical_column() untested because afaik we haven't ye…
Aug 20, 2024
4a05037
fix logic bug with passing 'value' and simplify slightly. Don't need …
Aug 20, 2024
6d3a8e6
fix 2nd logic bug, need to retain columns (actually only value_as_con…
Aug 20, 2024
e0ef9f3
fix logic bug in argument order, move collect to end. Still seem to g…
Aug 20, 2024
67687cd
seems to be working now on independent test data :-) fixed another lo…
Aug 20, 2024
748da0c
Update dev/omop_analyses/analyse_omop_cdm.R
andysouth Aug 20, 2024
ffdfa7f
completing table rename started on Github fron Stef's suggestion
Aug 20, 2024
13566a5
Merge branch 'main' into add-analyse_categorical_column
milanmlft Aug 21, 2024
1c4c9e1
Format and simplify code
milanmlft Aug 21, 2024
75c391e
Update spellcheck wordlist
milanmlft Aug 21, 2024
610b8e6
Fix: set correct name for attribute value
milanmlft Aug 21, 2024
65f9c85
Remove clean up steps from scripts
milanmlft Aug 21, 2024
52d1e00
Ensure database connections are clossed on exit, even if script fails
milanmlft Aug 21, 2024
6e245ee
Pull `analyse_*_column` helpers out of main function
milanmlft Aug 21, 2024
20e2a67
Make `analyse_categorical_column` more consistent with `analyse_numer…
milanmlft Aug 21, 2024
40251e6
Rename helper functions
milanmlft Aug 21, 2024
b48a5c4
Refactor `analyse_*` functions
milanmlft Aug 21, 2024
cdbc9d7
Add concept names to summary table (#33)
milanmlft Aug 21, 2024
bb345ce
Add concept names to monthly counts table (#33)
milanmlft Aug 21, 2024
2bb42fe
Rename functions
milanmlft Aug 21, 2024
5c90c5b
Fix comments
milanmlft Aug 21, 2024
7e3ae67
Remove NA values when calculating mean and sd
milanmlft Aug 21, 2024
4527344
Fix column selection
milanmlft Aug 21, 2024
3ab3e6d
Add concept names to result tables (#33)
milanmlft Aug 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 34 additions & 1 deletion dev/omop_analyses/analyse_omop_cdm.R
Original file line number Diff line number Diff line change
Expand Up @@ -114,10 +114,43 @@ analyse_summary_stats <- function(cdm) {
# Combine mean and standard deviation
bind_rows(df_mean, df_sd)
}

# Function to analyse a categorical column - present in observation and measurement
# by joining value_as_concept_id to cdm$concept by concept_id
analyse_categorical_column <- function(cdm, name_of_table) {

name_of_id_col <- paste0(name_of_table, "_concept_id")

# Rename columns and remove empty values
table <- cdm[[name_of_table]] |>
andysouth marked this conversation as resolved.
Show resolved Hide resolved
rename_with(~"concept_id", all_of(name_of_id_col)) |>
# beware CDM docs: NULL=no categorical result, 0=categorical result but no mapping
filter(value_as_concept_id != 0 & !is.null(value_as_concept_id))

# count freq and join to concept table to get name
df_freq_val_as_concept_named <- table |>
count(concept_id, value_as_concept_id) |>
left_join(select(cdm$concept, concept_id, concept_name),
by = c('value_as_concept_id' = 'concept_id')) |>
mutate(concept_id = concept_id,
#TODO as agreed 2024-08-16 enable concept_name here and in analyse_numeric_column
#OR could join concept_name at end of analyse_summary_stats()
#concept_name = concept_name,
milanmlft marked this conversation as resolved.
Show resolved Hide resolved
summary_attribute = "frequency",
value_as_string = concept_name,
value = n,
.keep="none") |>
collect()
}

# Combine results for all columns
bind_rows(
#numeric results
cdm$measurement |> analyse_numeric_column(measurement_concept_id, value_as_number),
cdm$observation |> analyse_numeric_column(observation_concept_id, value_as_number)
cdm$observation |> analyse_numeric_column(observation_concept_id, value_as_number),
#categorical results
cdm |> analyse_categorical_column("measurement"),
cdm |> analyse_categorical_column("observation")
)
}

Expand Down