Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: replace {crul} with {httr2} #157

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@ BugReports: https://github.com/ropensci/opencage/issues
Depends:
R (>= 3.4.0)
Imports:
crul (>= 0.5.2),
dplyr (>= 0.7.4),
httr2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is v0.1.1 sufficient or do we use a feature from a more recent version?
https://httr2.r-lib.org/news/index.html

(Didn't realise that httr2 is so recent 😮)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a dependency on >= 0.2.0 because of r-lib/httr2#101

jsonlite (>= 1.5),
lifecycle,
magrittr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well take the dependency on magrittr v2.0 then?
I.e. does it make sense to more generally bump all versions roughly to the one before the most recent necessary dependency (i.e. httr2) came out? Or is that a mistake? I wouldn't know what more ancient versions really mean practically?

Suggested change
magrittr,
magrittr (>= 2.0.0),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a specific reason for depending on that magrittr version?

memoise (>= 1.1.0),
progress (>= 1.1.2),
purrr (>= 0.2.4),
ratelimitr (>= 0.4.0),
rlang,
tibble (>= 1.4.2),
tidyr (>= 0.8.0),
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,5 @@ export(oc_reverse_df)
export(opencage_forward)
export(opencage_key)
export(opencage_reverse)
importFrom(magrittr,`%>%`)
importFrom(rlang,.data)
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# opencage (development version)


* http requests are now handled by {[httr2](https://httr2.r-lib.org/)}, not {[crul](https://docs.ropensci.org/crul/)}; and rate limitation / throttling by httr2, not [ratelimitr](https://github.com/tarakc02/ratelimitr) ([#156](https://github.com/ropensci/opencage/issues/156)).
* opencage now supports an `address_only` parameter, see "[New optional API parameter 'address_only'](https://blog.opencagedata.com/post/new-optional-parameter-addressonly)", ([#151](https://github.com/ropensci/opencage/pull/151)).
* The geocoding functions will not send a query to the API anymore if no API key is present ([#133](https://github.com/ropensci/opencage/issues/133)).
* `NA`s are allowed again for the `placename` or `latitude`/`longitude` arguments (also empty strings for `placename`).
Expand Down
8 changes: 7 additions & 1 deletion R/oc_api_ok.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,11 @@
#' @keywords internal

oc_api_ok <- function(url = "https://api.opencagedata.com") {
crul::ok(url, useragent = "https://github.com/ropensci/opencage")
resp <- httr2::request(url) %>%
httr2::req_method("HEAD") %>%
httr2::req_user_agent(oc_ua_string) %>%
httr2::req_error(is_error = function(resp) FALSE) %>%
httr2::req_perform()

!httr2::resp_is_error(resp)
}
7 changes: 0 additions & 7 deletions R/oc_config.R
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@
#' @export
oc_config <-
function(key = Sys.getenv("OPENCAGE_KEY"),
rate_sec = getOption("oc_rate_sec", default = 1L),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe needs actual deprecation 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave it in here (so all possible opencage session level options can be set with oc_config()) and just set options("oc_rate_sec" = rate_sec), like no_record and show_key.
Maybe check whether rate_sec is numeric beforehand.

no_record = getOption("oc_no_record", default = TRUE),
show_key = getOption("oc_show_key", default = FALSE),
...) {
Expand Down Expand Up @@ -119,12 +118,6 @@ oc_config <-

Sys.setenv(OPENCAGE_KEY = pat)

# set rate limit
ratelimitr::UPDATE_RATE(
oc_get_limited,
ratelimitr::rate(n = rate_sec, period = 1L)
)

# set no_record
oc_check_logical(no_record, check_length_one = TRUE)
options("oc_no_record" = no_record)
Expand Down
77 changes: 39 additions & 38 deletions R/oc_process.R
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ oc_process <-
}

# build url
oc_url <- oc_build_url(
oc_url_parts <- oc_build_url(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current naming confused me a little, I am afraid.

I think there are two to three steps until the request is made

1a. format the query parameters, i.e. turn booleans into integers, delete unused parameters, concatenate bounds coordinates, etc. This is now done in oc_build_url() except this does not yet build the url, so maybe oc_format_query_parameters() and the resulting list is query_parameters? I would move the endpoint as a separate input to the next step/function.
1b. build the httr2 request object (or rather just its url part). Basically what build_query_with_req() does now (maybe just name it build_request()?). The inputs are the query_parameters and endpoint, the output is query_req.
2. Make the request with oc_get(). Pretty much like it is now, but with request or similar as an argument name. (Note to self: This has to be a separate step, so we can return the url_only before).

🤔 À propos naming: Maybe I overdid it with the oc_ prefix for internal functions... 😉

query_par = list(
q = query,
bounds = bounds,
Expand All @@ -150,6 +150,8 @@ oc_process <-
),
endpoint = endpoint
)
query_req <- build_query_with_req(oc_url_parts)
oc_url <- query_req[["url"]]

# return url only
if (return == "url_only") {
Expand All @@ -166,13 +168,11 @@ oc_process <-
# send query to API if not NA, else return (fake) empty res_text
if (!is.na(query) && nchar(query) >= 2) {
# get response
res_env <- oc_get_memoise(oc_url)
res_env <- oc_get_memoise(oc_url_parts)

# parse response
res_text <- oc_parse_text(res_env)

# check status message
oc_check_status(res_env, res_text)
dpprdan marked this conversation as resolved.
Show resolved Hide resolved
} else {
# Fake 0 results response

Expand Down Expand Up @@ -236,8 +236,7 @@ oc_build_url <- function(query_par, endpoint) {

oc_path <- paste0("geocode/v1/", endpoint)

crul::url_build(
url = "https://api.opencagedata.com",
list(
path = oc_path,
query = query_par
)
Expand All @@ -246,18 +245,43 @@ oc_build_url <- function(query_par, endpoint) {

#' GET request from OpenCage
#'
#' @param oc_url character string URL with query parameters, built with
#' @param oc_url_parts list with path and query, built with
#' `oc_build_url()`
#'
#' @return crul::HttpResponse object
#' @return httr2 response
#' @noRd

oc_get <- function(oc_url) {
client <- crul::HttpClient$new(
url = oc_url,
headers = list(`User-Agent` = oc_ua_string)
)
client$get()
oc_get <- function(oc_url_parts = NULL) {

query_req <- build_query_with_req(oc_url_parts)

Comment on lines +254 to +257
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
oc_get <- function(oc_url_parts = NULL) {
query_req <- build_query_with_req(oc_url_parts)
oc_get <- function(query_req = NULL) {

The build_query_with_req() in .oc_process() ought to suffice?

query_req %>%
httr2::req_throttle(rate = getOption("oc_rate_sec", default = 1L)/1L) %>%
httr2::req_user_agent(oc_ua_string) %>%
httr2::req_perform() # will error if API error :-)
}

build_query_with_req <- function(oc_url_parts) {
initial_req <- httr2::request("https://api.opencagedata.com")

req_with_url <- if (!is.null(oc_url_parts[["path"]])) {
httr2::req_url_path_append(initial_req, oc_url_parts[["path"]])
} else {
initial_req
}


if (!is.null(oc_url_parts[["query"]])) {
# some gymnastics needed as we can't pass the query as a list to httr2?
args <- oc_url_parts[["query"]]
args[[".req"]] <- req_with_url

query_req <- do.call(what = httr2::req_url_query, args = args)
} else {
query_req <- req_with_url
}

query_req
}

# user-agent string: this is set at build-time, but that should be okay,
Expand All @@ -277,36 +301,13 @@ oc_ua_string <-
#' @noRd

oc_parse_text <- function(res) {
text <- res$parse(encoding = "UTF-8")
text <- httr2::resp_body_string(res)
if (identical(text, "")) {
stop("OpenCage returned an empty response.", call. = FALSE)
}
text
}


#' Check the status of the HttpResponse
#'
#' @param res_env crul::HttpResponse object
#' @param res_text parsed HttpResponse
#'
#' @return NULL if status code less than or equal to 201, else `stop()`
#' @noRd

oc_check_status <- function(res_env, res_text) {
if (res_env$success()) {
return(invisible())
}
message <-
jsonlite::fromJSON(
res_text,
simplifyVector = TRUE,
flatten = TRUE
)$status$message
stop("HTTP failure: ", res_env$status_code, "\n", message, call. = FALSE)
}


#' Format the result string
#'
#' @param res_text parsed HttpResponse
Expand Down
22 changes: 6 additions & 16 deletions R/zzz.R
Original file line number Diff line number Diff line change
@@ -1,27 +1,17 @@
#' @importFrom magrittr `%>%`

# We use `<<-` below to modify the package's namespace.
# It doesn't modify the global environment.
# We do this to prevent build time dependencies on {memoise} and {ratelimitr},
# We do this to prevent build time dependencies on {memoise},
# as recommended in <http://memoise.r-lib.org/reference/memoise.html#details>.
# Cf. <https://github.com/r-lib/memoise/issues/76> for further details.

# First make sure that the functions are defined at build time
oc_get_limited <- oc_get
oc_get_memoise <- oc_get_limited
# First make sure that the function is defined at build time
oc_get_memoise <- oc_get

# Then modify them at load-time
# nocov start
.onLoad <- function(libname, pkgname) {
# limit requests per second
oc_get_limited <<-
ratelimitr::limit_rate(
oc_get,
# rate can be changed via oc_config()/ratelimitr::UPDATE_RATE()
ratelimitr::rate(
n = 1L,
period = 1L
)
)

oc_get_memoise <<- memoise::memoise(oc_get_limited)
oc_get_memoise <<- memoise::memoise(oc_get)
}
# nocov end
9 changes: 4 additions & 5 deletions man/oc_config.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions tests/testthat/test-oc_build_url.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
test_that("oc_build_url returns a string", {
test_that("oc_build_url returns a list", {
expect_type(
oc_build_url(
query_par = list(placename = "Haarlem"),
endpoint = "json"
),
"character"
"list"
)
})
6 changes: 3 additions & 3 deletions tests/testthat/test-oc_clear_cache.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ test_that("oc_clear_cache clears cache", {
# until a memoise >v.1.1 is released, we need to run oc_get_memoise() twice to
# have it really cache results
# https://github.com/ropensci/opencage/pull/87#issuecomment-573573183
maelle marked this conversation as resolved.
Show resolved Hide resolved
replicate(2, oc_get_memoise("https://httpbin.org/get"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure these edits were the right ones. However now one cannot "get" stuff from an URL that's not the OpenCage API, with the refactoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure either whether this actually tests what we want to.
Could we build oc_get() in such a way that it accepts a httr2 request object and just performs the request (i.e. just wrapper around httr2::req_perform())? Then we could just build a simple query request here (I'd rather not hit the OpenCage API for this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now wondering why we are testing memoization at all, given it's "outsourced" to {memoise}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not convinced we need to test oc_clear_cache() as it only wraps memoise that we trust?

By the way here's another caching method: https://deploy-preview-537--ropensci.netlify.app/blog/2023/04/21/ropensci-news-digest-april-2023/#caching-the-results-of-functions-of-your-r-package

expect_true(memoise::has_cache(oc_get_memoise)("https://httpbin.org/get"))
replicate(2, oc_get_memoise())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
replicate(2, oc_get_memoise())
oc_get_memoise()

expect_true(memoise::has_cache(oc_get_memoise)())
oc_clear_cache()
expect_false(memoise::has_cache(oc_get_memoise)("https://httpbin.org/get"))
expect_false(memoise::has_cache(oc_get_memoise)())
})
19 changes: 0 additions & 19 deletions tests/testthat/test-oc_config.R
Original file line number Diff line number Diff line change
Expand Up @@ -48,25 +48,6 @@ test_that("oc_config throws error with faulty OpenCage key", {
)
})

# test rate_sec argument --------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a little test whether oc_config() can update the oc_rate_limit option?


test_that("oc_config updates rate limit of oc_get_limit", {
# make sure there is a key present
withr::local_envvar(c("OPENCAGE_KEY" = key_200))

rps <- 5L
oc_config(rate_sec = rps)
expect_identical(ratelimitr::get_rates(oc_get_limited)[[1]][["n"]], rps)

rps <- 3L
oc_config(rate_sec = rps)
expect_identical(ratelimitr::get_rates(oc_get_limited)[[1]][["n"]], rps)

# set rate_sec back to default: 1L/sec
oc_config()
expect_identical(ratelimitr::get_rates(oc_get_limited)[[1]][["n"]], 1L)
})

# test no_record argument -------------------------------------------------

test_that("oc_config sets no_record option", {
Expand Down
11 changes: 0 additions & 11 deletions tests/testthat/test-oc_get.R
Original file line number Diff line number Diff line change
Expand Up @@ -56,17 +56,6 @@ test_that("oc_get returns a response object for vector countrycode", {
)
})

test_that("oc_get_limited is rate limited", {
skip_on_cran()
skip_if_offline("httpbin.org")

tm <- system.time({
replicate(2, oc_get_limited("https://httpbin.org/get"))
})
rate <- ratelimitr::get_rates(oc_get_limited)
expect_gte(tm[["elapsed"]], rate[[1]][["period"]] / rate[[1]][["n"]])
})

test_that("oc_get_memoise memoises", {
skip_on_cran()
skip_if_offline("httpbin.org")
Expand Down