-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
221 lines (139 loc) · 7.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
<!-- use devtools::build_readme() -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
warning = FALSE,
message = FALSE,
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# omopcept
<!-- badges: start -->
<!-- badges: end -->
**Active development 2024, breaking changes possible**
**omopcept** provides access to **OMOP** *con***CEPT***s* (all pros, no *cons*!).
Initial motivation was to make it super-easy to get the names associated with concept IDs.
Later additions allow exploration and visualisation of OMOP hierarchies.
**omopcept** provides R functions that are :
* modern
* flexible
* tidyverse compatible
* memory efficient (using arrow parquet)
**omopcept** includes concise named copies of functions designed for interactive use e.g. `oid()` and `onames()` to search concept ids and names respectively. For example the line below can be used to return all ~ 1000 OMOP ids for SNOMED codes for clinical drugs starting with A.
```
onames("^a",d="DRUG",v="SNOMED",cc="Clinical Drug")
```
## Installation
Install the development version of omopcept with:
``` r
# install.packages("remotes")
remotes::install_github("SAFEHR-data/omopcept")
```
## OMOP vocabularies data
OMOP vocabularies can be searched and downloaded from [Athena – the OHDSI vocabularies repository](https://athena.ohdsi.org). **omopcept** provides R tools to interact with OMOP concepts in a more reproducible way.
**omopcept** can use vocabulary files that you have downloaded from Athena, or automatically download a subset of the vocabularies that we have saved in the cloud.
## Getting started with omopcept
On initial use omopcept tries to download OMOP vocabulary files from the cloud to a local package cache where it can be accessed in future sessions. The [arrow R package](https://arrow.apache.org/docs/r/index.html) allows parquet files to be opened and queried in dplyr pipelines without having to read in the data. e.g. the code below will return just the top rows of the concept table.
``` r
library(omopcept)
omop_concept() |>
head() |>
dplyr::collect()
```
## Main omopcept functions
| full name | quick interactive name | action |
| ------- | ------- | ---------------- |
| `omop_names()` | `onames()` | search concepts by parts of names |
| `omop_id()` | `oid()` | search for concept_id(s) |
| `omop_domain()` | - | return domain for concept_id(s) |
| `omop_join_name()` | `ojoin()` | join an omop name column onto a table with an id column |
| `omop_join_name_all()` | `ojoinall()` | join omop names columns onto all id columns in a table |
| `omop_check_names()` | `ochecknames()` | check that names match ids |
||||
| `omop_ancestors()` | `oance()` | return ancestors of a concept |
| `omop_descendants()` | `odesc()` | return descendants of a concept |
| `omop_relations()` | `orels()` | return (immediate) relations of a concept
| `omop_relations()` | `orels()` | return (immediate) relations of a concept and the relations of those up to `nsteps` including the nature of the relationship e.g. 'Is a'|
| `omop_graph()` | - | graph omop relationships (experimental) |
||||
| `omop_concept()` | `oc()` | return reference to concept table (for use in dplyr pipelines) |
| `omop_concept_ancestor()` | `oca()` | return reference to concept ancestor table |
| `omop_concept_relationship()` | `ocr()` | return reference to concept relationship table |
| `omop_concept_fields()` | `ocfields()` | names of concept table columns |
| `omop_concept_ancestor_fields()` | `ocafields()` | names of concept ancestor table columns|
| `omop_concept_relationship_fields()` | `ocrfields()` | names of concept relationship table columns |
## OMOP outline
The [OMOP Common Data Model](https://ohdsi.github.io/CommonDataModel/) is an open standard for health data. "[It is] designed to standardize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence".
OMOP is maintained by OHDSI (pronounced "Odyssey"). "The Observational Health Data Sciences and Informatics program is a multi-stakeholder, interdisciplinary collaborative that strives to improve medical decision making and bring better health outcomes to patients around the world."
## OMOP vocabularies in the background
Vocabularies downloaded from Athena include tables called CONCEPT.csv, CONCEPT_ANCESTOR.csv and CONCEPT_RELATIONSHIP.csv.
You have two main options :
1. manually download selected vocabulary csv files from Athena, use `omopcept::omop_vocabs_preprocess()`
2. automatically download pre-processed vocabulary files saved in the cloud by us
**omopcept** downloads a selection of vocabularies and stores them locally the first time you use it (in the recommended data location for R packages). The download does not need to be repeated until you update the package. Vocabularies are stored as [parquet](https://parquet.apache.org/) files that can be queried in a memory-efficient manner without having to first read the data in to memory.
### OMOP concept table fields
```{r concept table, echo=FALSE}
library(omopcept)
fieldnames <- omop_concept() |>
head() |>
names()
df1 <- data.frame(fields=fieldnames,
about=c("unique id","descriptive name","e.g. drug, measurement","e.g. LOINC, SNOMED","e.g. Clinical Observation, Organism","standard or not","source code","","",""),
query_arguments=c("c_ids","pattern","d_ids","v_ids","cc_ids","standard","","","","")
)
knitr::kable(df1)
```
## `omop_names()`: query concepts by their names
```{r omop_names, echo=TRUE}
omop_names("chemotherapy", v_ids="LOINC")
omop_names("chemotherapy", v_ids=c("LOINC","SNOMED"), d_ids=c("Observation","Procedure"))
```
## `omop_join_name()`: join concept names onto a `*concept_id` dataframe column
Helps to interpret OMOP data.
```{r omop_join_name, echo=TRUE}
data.frame(concept_id=(c(3571338L,4002075L))) |>
omop_join_name()
data.frame(drug_concept_id=(c(4000794L,4002592L))) |>
omop_join_name(namestart="drug")
```
## `omop_join_name_all()`: join concept names onto all `*_concept_id` columns in a dataframe
```{r omop_join_name_all, echo=TRUE}
data.frame(concept_id=(c(3571338L,3655355L)),
drug_concept_id=(c(4000794L,35628998L))) |>
omop_join_name_all()
```
## `omop_graph()`: starting to visualise OMOP hierarchies
```{r omop_graph, message=FALSE}
sharp <- omop_names("Accident caused by sharp-edged object", standard="S")
relations <- omop_relations(sharp$concept_id,
r_ids=c('Is a','Subsumes'),
nsteps=2)
omop_graph(relations, saveplot=FALSE, graphtitle=NULL,
legendshow=FALSE, nodetxtsize=5, textcolourvar="step")
```
## Vocabularies included
The vocabularies that omopcept downloads automatically are a default download from Athena with a few extra vocabs added.
If you wish to control which vocabularies are included you can manually download vocabulary csv files from Athena.
### Numbers of concepts in automatic **omopcept** vocabulary download by domain and vocabulary
```{r conceptplot, echo = TRUE, fig.height=12}
library(dplyr)
library(ggplot2)
library(forcats)
concept_summary <-
omop_concept() |>
count(vocabulary_id, sort=TRUE) |>
collect()
ggplot(concept_summary,aes(y=reorder(vocabulary_id,n),x=n,col=vocabulary_id)) +
geom_point() +
labs(y = "vocabulary_id") +
guides(col="none") +
theme_minimal()
```
### Acknowledgements
Development of `omopcept` has been partly supported by the [UCLH Biomedical Research Centre](https://www.uclhospitals.brc.nihr.ac.uk/).