Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use automatic or user-supplied symbolic derivatives for the density of transformed distributions #101

Closed

Conversation

venpopov
Copy link
Contributor

@venpopov venpopov commented Apr 2, 2024

Summary

As discussed in #97, this PR implements automatic symbolic differentiation of the inverse transformation function. It also allows users to supply a deriv function to dist_transformed(), in addition to the transform and inverse functions, which will be used instead of numeric derivatives in computing the density.

Details on the implementation

Deriv package

I looked into different packages for symbolic differentiation and decided to use the Deriv::Deriv package instead of stats::deriv for the following reasons:

  • it can compute derivatives directly from function calls rather than needing an expression
  • It works recursively through nested functions to build derivatives from the component basic functions. stats::deriv fails with custom functions even if they are composed of functions that are in its lookup-table. Using Deriv::Deriv ends up working in all cases I tried, even with a very complicated set of nested custom functions (see the test here for example)
  • It is a lightweight dependency, importing only methods
  • it allows adding custom rules to its lookup table. For now I found now need, but could be useful for the future.
  • It outputs a function rather than an expression, which can be directly provided to the density method

Constructing the derivative functions

  • The symbolic derivative function is constructed only once when the dist_transformed object gets created. It is save in a field deriv of the dist object just like transform and inverse

  • This happens in 4 places:

      1. Math.dist_default() - for the first transformation with a unary function such as log(dist_gamma(1,1)
      1. Ops.dist_default() - for the first transformation with a binary operator such as dist_gamma(1,1)^2
      1. Math.dist_default() - for subsequent unary transformations
      1. Ops.dist_default() - for subsequent binary operator transformations

In 1 and 2, the Deriv::Deriv is applied to the initial inverse function. E.g:

  trans <- new_function(exprs(x = ), body = expr((!!sym(.Generic))(x, !!!dots_list(...))))

  inverse <- get_unary_inverse(.Generic, ...)
  inverse <- Deriv::Simplify(inverse)

  deriv <- suppressWarnings(try(Deriv::Deriv(inverse, x = 'x'), silent = TRUE))

In 3 and 4, it is applied to the compounded inverse function from the multiple transformations:

  # replace x in the previous inverse function body with the current inverse function body
  prev_inverse <- x$inverse
  current_inverse <- get_unary_inverse(.Generic, ...)
  body(prev_inverse) <- substituteDirect(body(prev_inverse), list(x = body(current_inverse)))
  inverse <- Deriv::Simplify(prev_inverse)

  deriv <- suppressWarnings(try(Deriv::Deriv(inverse, x = 'x'), silent = TRUE))

This happens only for transformations that are applied directly via myfun(dist_gamma(1,1)). If a user uses dist_transformed(), they can provide a function to the new deriv argument, and this will be used (@mjskay)

Calculating the derivatives for the density

Now that the derivative function is available in the dist object, the actual calculation happens in density.dist_transformed(). It has a conditional statement that checks if the derivative function is available, and if the new argument deriv_method is symbolic (default) or numeric. It calculates the derivative with either the old numeric method, or symbolicaly, and the rest is the same.

Changes to the construction of the inverses

To make the above work, I had to change how the inverses are constructed. The previous syntax created functions with many (function(x) {})(x) type calls, which did not work with the Deriv package. This was the trickiest part. The previous syntax worked even thought there were the same constant terms in different nested functions. There are two components to my changes:

  1. Changes to get_*_inverse_* inverse functions to substitute the constants or optional arguments in the function body, and to return a wrapped primitive function:
get_unary_inverse <- function(f, ...) {
  switch(f,
         sqrt = function(x) x^2,
         exp = function(x) log(x),
         log = (function(x, base) {
           if (missing(base)) function(x) exp(x)
           else new_function(exprs(x = ), expr((!!base)^x))
         })(x, ...),
         log2 = function(x) 2^x,
         log10 = function(x) 10^x,
         expm1 = function(x) log1p(x),
         log1p = function(x) expm1(x),
         cos = function(x) acos(x),
         sin = function(x) asin(x),
         tan = function(x) atan(x),
         acos = function(x) cos(x),
         asin = function(x) sin(x),
         atan = function(x) tan(x),
         cosh = function(x) acosh(x),
         sinh = function(x) asinh(x),
         tanh = function(x) atanh(x),
         acosh = function(x) cosh(x),
         asinh = function(x) sinh(x),
         atanh = function(x) tanh(x),

         invert_fail
  )
}

get_binary_inverse_1 <- function(f, constant) {

  switch(f,
         `+` = new_function(exprs(x = ), body = expr(x - !!constant)),
         `-` = new_function(exprs(x = ), body = expr(x + !!constant)),
         `*` = new_function(exprs(x = ), body = expr(x / !!constant)),
         `/` = new_function(exprs(x = ), body = expr(x * !!constant)),
         `^` = new_function(exprs(x = ), body = expr(x ^ (1/!!constant))),

         invert_fail
  )
}
  1. When repreated transformations are applied, such as 1-3*exp(-exp(5+dist_gamma(1,1)^2)), at each stage the body of the new inverse function is injected to replace the x symbol in the body of the previous inverse function. E.g. in the above example we get the following inverse functions at each step:

sqrt(x) -> sqrt(x-5) -> sqrt(log(x)-5) -> sqrt(log(-x)-5) -> sqrt(log(-log(x))-5) -> sqrt(log(-log(x/3))-5) ....

This makes it very easy for Deriv::Deriv to find a symbolic derivative, and so far I didn't find any combination of functions in the inverse functions tables that failed. I added extensive new tests to ensure that the inverses and derivatives match the know analytic versions. This is quite stable now.

Speed

Aside from numerical stability, the other benefit is much improved speed, because we find the derivative only when constructing the object, and afterwards the operation is completely vectorized.

builtin <- dist_lognormal(0, 1)
trans <- exp(dist_wrap('norm'))
x <- seq(1, 10, 0.001)

microbenchmark::microbenchmark(
  builtin = density(builtin, x)[[1]],
  trans_numeric = density(trans, x, deriv_method = "numeric")[[1]],
  trans_symbolic = density(trans, x, deriv_method = "symbolic")[[1]],
  check = "equal"
)

#> Unit: microseconds
#>            expr        min          lq        mean      median          uq        max neval
#>         builtin    113.488    126.7105    142.1835    137.6985    156.1895    191.839   100
#>   trans_numeric 267743.981 273165.0215 281941.3901 276509.5760 279855.7500 554079.822   100
#>  trans_symbolic    291.838    331.1570    349.4364    342.9035    361.5790    510.696   100

Examples

Here are some examples of the resulting inverse and derivative functions

Getting a gumbel distribution from a uniform:

library(distributional)
d <- dist_uniform(0, 1)
my_gumbel <- -log(-log(d))
default_gumbel <- dist_gumbel(0, 1)

fget(my_gumbel, inverse)
#> function (x) 
#> exp(-exp(x/-1))
#> <environment: 0x103a766b0>
fget(my_gumbel, deriv)
#> function (x) 
#> {
#>     .e2 <- exp(-x)
#>     exp(-.e2) * .e2
#> }
#> <environment: 0x103a766b0>

x <- seq(-4, 8, 0.01)
my_dens <- density(my_gumbel, x, verbose = TRUE)[[1]]
#> Using symbolic differentiation
def_dens <- density(default_gumbel, x)[[1]]

plot(x, my_dens, type = "l", col = "red", lwd = 2, xlab = "x", ylab = "Density")
lines(x, def_dens, col = "blue", lwd = 2)

Created on 2024-04-02 with reprex v2.1.0

From a shifted and rescale logistic to log-logistic, rescale, back to standard-logistic

With inverses printed in blue and derivatives printed in red. You can see that the functions get simplified if possible, which is quite neat.

loc = 1
scale = 0.3
alpha = exp(loc)
beta = 1 / scale
d1 <- dist_logistic(loc, scale)
d2 <- exp(d1)
d3 <- d2^beta / alpha^beta
d4 <- log(d3)
data.frame(dist = c(d1,d2,d3,d4),
           labels = c('d1','d2','d3','d4'),
           inverse = sapply(fget(c(d1,d2,d3,d4),inverse), function(x) deparse(body(x))),
           deriv = sapply(fget(c(d1,d2,d3,d4),deriv), function(x) deparse(body(x)))) %>% 
ggplot() + 
  stat_slab(aes(xdist = dist, y = labels), limits = c(-8,8), normalize = "groups") +
  theme_ggdist() +
  geom_text(aes(label = inverse, y = labels), x = -5, vjust = -5, col = "blue") +
  geom_text(aes(label = deriv, y = labels), x = 5, vjust = -5, col = "red")

image

A pretty crazy transformation but it works

So I tried to push it as much as I could to break it with the following crazy transformation involving multiple custom nested functions, but it worked, and the derivative matches the output of WolframAlpha

### -------- a super complicated final case --------------
  # nested custom functions involved in further nested transformation
  inv_logit <- function(x) 1/(1 + exp(-x))
  coshsquared_inv_logit <- function(x) cosh(inv_logit(x))^2
  dist2 <- exp(2-1/log(2+coshsquared_inv_logit(d), base = 10)^(0.4+0.1))

  # get random values within the support and test the inverse
  lim <- field(support(dist2), 'lim')[[1]]
  v <- runif(10, lim[1], lim[2])
  res <- exp(2-1/log(2+coshsquared_inv_logit(v), base = 10)^(0.4+0.1))
  expect_equal(vec_data(dist2)[[1]]$inverse(res), v)

  # monstrous analytical derivative based on wolfram alpha
  an_deriv <- function(x) {
    term1 <- (log(10) * 10^(1/((log(x) - 2)^2)))
    term2 <- x * sqrt(10^(1/((log(x) - 2)^2)) - 3)
    term3 <- sqrt(10^(1/((log(x) - 2)^2)) - 2)
    term4 <- (log(x) - 2)^3
    term1 / (term2 * term3 * term4 * (acosh(term3) - 1) * acosh(term3))
  }

  all.equal(fget(dist2, deriv)(v), an_deriv(v))
#> TRUE

fget(dist2, inverse)
#> function (x) 
#> -log(1/acosh(sqrt(10^(1/(2 - log(x)))^2 - 2)) - 1)


 fget(dist2, deriv)
#> function (x) 
#> {
#>     .e1 <- 2 - log(x)
#>     .e3 <- 10^(1/.e1)^2
#>     .e5 <- sqrt(.e3 - 2)
#>     .e6 <- acosh(.e5)
#>     2.30258509299405 * (.e3/(x * (1/.e6 - 1) * .e1^3 * .e6^2 * 
#>         .e5 * sqrt(.e3 - 3)))
}

@venpopov
Copy link
Contributor Author

venpopov commented Apr 2, 2024

btw, I added a small utility function fget to avoid having to type vec_data(dists)[[1]]$inverse to get functions while I worked on this. I find it useful, but if you'd prefer not to have it in, I can remove it ;)

@mjskay
Copy link
Contributor

mjskay commented Apr 3, 2024

Awesome!!

I have a couple of thoughts:

  • If the symbolic derivative fails, can it fall back silently to numerical differentiation, instead of printing a message? Or at least, can there be an option for such a mode? Currently it prints a message and then fails later if I call density(). From the {ggdist} side I would not want this message cluttering output when plotting, and I would want it to always still attempt the transformation later.

  • I worry that the approach of trying to compose a single inverse will be brittle for some complex functions. R semantics are so weird that I could see situations where if someone has done something with a function definition you don't expect, dropping its code into the middle of another function could fail in unexpected ways (e.g. what happens if a function has been constructed dynamically in a surrounding environment that contains some constants the function is using? If you drop the function body into the middle of another function, I don't think the constants would be evaluated in the correct environment).

  • Building on the last point, the approach of composing a single inverse then taking the derivative will not work in cases where {Deriv} can do derivatives of some of the inverses but not all of them, even if the user has supplied the derivatives {Deriv} cannot do.

    For example, {Deriv} cannot find the derivative of the inverse of the Yeo-Johnson transformation:

    yj = scales::transform_yj(0)
    Deriv::Deriv(yj$inverse)
    ## Error in FUN(X[[i]], ...) : Could not retrieve body of 'is.na()'

    However, since {scales} provides the derivative and the new dist_transformed() now accepts it, we can still construct a transformed distribution with it using the known derivative (great!):

    d = dist_transformed(dist_normal(), yj$transform, yj$inverse, yj$d_inverse)
    density(d, 0)
    ## [1] 0.3989423

    But if I try to apply a further transform, it will not work, since it will try to differentiate the entire inverse, which {Deriv} can't do:

    neg_d = -d
    ## Cannot compute the derivative of the inverse function symbolicly.
    density(neg_d, 0)
    ## Error in x[["deriv"]](at) : attempt to apply non-function

    However, using an autodiff type approach, you can find a derivative of a chain of functions that you know the derivatives of by applying the chain rule. This is what {scales} does to find derivatives of compositions of transforms; see compose_deriv_fwd() and compose_deriv_rev() here.

    This approach has two advantages: (1) you don't need to derive the composed inverse function (avoiding any corner cases created by R's weird semantics when substituting code) and (2) you can find the derivatives of chains of functions so long as either the user or {Deriv} knows the derivative of each function in the chain. This second property is particularly important for {ggdist}, because if someone applies a scale transformation (where we know the derivatives) to an already-transformed distribution, I'd like to be able to use dist_transformed() on the existing distribution to create a new distribution that can calculate densities without using numerical diff.

Thanks again for working on this, I'm really looking forward to having more reliable distribution transformation for ggdist...

@venpopov
Copy link
Contributor Author

venpopov commented Apr 3, 2024

Thanks for the detailed and thoughtful feedback. Indeed, we would want it to be as robust as possible. Point 1 is easy and I thought it already falls back to the numeric derivative if it fails, but I'll check. For point 2, could you give me an example function for which the current approach will fail so that I add this as a test case and see whether any alternative approach works with it?

@venpopov
Copy link
Contributor Author

venpopov commented Apr 3, 2024

Ok, you are right about weird environment issues with substituting the inverses. Actually your example with scales showcases this if we skip the derivative part. This works on the master branch:

library(distributional)
yj = scales::transform_yj(0)

d = dist_transformed(dist_normal(2), yj$transform, yj$inverse)
density(d, 1)
#> [1] 1.042247

neg_d = -d
density(neg_d, 1)
#> [1] 0.005514935

Created on 2024-04-03 with reprex v2.1.0

But it fails on the PR branch:

library(distributional)
yj = scales::transform_yj(0)

d = dist_transformed(dist_normal(2), yj$transform, yj$inverse)
density(d, 1, deriv_method = "numeric")
#> [1] 1.042247

neg_d = -d
#> Cannot compute the derivative of the inverse function symbolicly.
density(neg_d, 1, deriv_method = "numeric")
#> Error in trans_two_sided(x/-1, inv_pos, inv_neg): could not find function "trans_two_sided"

Created on 2024-04-03 with reprex v2.1.0

@mjskay
Copy link
Contributor

mjskay commented Apr 3, 2024

Yeah, that pattern is used in a number of the transformations defined in {scales}. A simple example if you need a test case:

f = (function() { 
  b = 2
  function(x) x * b
})()

f_inv = (function() {
  b = 2
  function(x) x / b
})()

d_f_inv = (function() {
  b = 2
  function(x) 1 / b
})()

d = dist_transformed(dist_normal(), f, f_inv, d_f_inv)
density(d, 0)
## [1] 0.1994711
density(exp(d), 0)
## Error in x[["deriv"]](at) : object 'b' not found

I think it's not fixable in general using the code substitution approach. The autodiff approach should work on these though.

@venpopov
Copy link
Contributor Author

venpopov commented Apr 3, 2024

that's perfect, thanks, it's good to have cases like this in the automatic tests anyway. You may be right, but let me see first if there is an easy solution

@mjskay
Copy link
Contributor

mjskay commented Apr 3, 2024

that's perfect, thanks, it's good to have cases like this in the automatic tests anyway. You may be right, but let me see first if there is an easy solution

Fair enough :). FWIW I do not think code substitution can work in the general case because solving the problem requires different pieces of code to be executed in different environments. The usual ways to solve that problem are either functions or quosures, and I assume {Deriv} doesn't support quosures.

- if symbolic derivative fails, fallback to numeric
- print message that symbolic failed only if options(dist.verbose = TRUE)
- save environment of inverse function into new inverse during transformations
@venpopov
Copy link
Contributor Author

venpopov commented Apr 3, 2024

Ok, turned out 1 and 2 were both easy fixes.

  • If the symbolic derivative fails, can it fall back silently to numerical differentiation, instead of printing a message? Or at least, can there be an option for such a mode? Currently it prints a message and then fails later if I call density(). From the {ggdist} side I would not want this message cluttering output when plotting, and I would want it to always still attempt the transformation later.

Now it will only print the message if options(dist.verbose = TRUE) and it will fall back to numeric derivatives

  • I worry that the approach of trying to compose a single inverse will be brittle for some complex functions. R semantics are so weird that I could see situations where if someone has done something with a function definition you don't expect, dropping its code into the middle of another function could fail in unexpected ways (e.g. what happens if a function has been constructed dynamically in a surrounding environment that contains some constants the function is using? If you drop the function body into the middle of another function, I don't think the constants would be evaluated in the correct environment).

It was just a matter of specifyin the new function's environment to match the previous inverse's environment:

  body(prev_inverse) <- substituteDirect(body(prev_inverse), list(x = body(current_inverse)))
  inverse <- Deriv::Simplify(prev_inverse, env = environment(prev_inverse))
    body <- substituteDirect(body(prev_inverse), list(x = body(current_inverse)))
    inverse <- new_function(exprs(x = ), body = body, env = environment(prev_inverse))

Note that we only need the environment of the previous inverse, for the new inverse, the existing get_*_inverse functions retrieve what is necessary correctly.

Now all of these examples work (I turned on verbose mode to see the messages, but they will not be printed by default):

library(distributional)
options(dist.verbose = TRUE)
yj = scales::transform_yj(0)

d = dist_transformed(dist_normal(2), yj$transform, yj$inverse, yj$d_inverse)
density(d, 1)
#> Using symbolic differentiation
#> [1] 1.042247

neg_d = -d
#> Cannot compute the derivative of the inverse function symbolicly.
density(neg_d, 1)
#> Using numerical differentiation.
#> [1] 0.005514935

Created on 2024-04-03 with reprex v2.1.0

library(distributional)
options(dist.verbose = TRUE)
f = (function() { 
  b = 2
  function(x) x * b
})()

f_inv = (function() {
  b = 2
  function(x) x / b
})()

d_f_inv = (function() {
  b = 2
  function(x) 1 / b
})()

d = dist_transformed(dist_normal(), f, f_inv, d_f_inv)
density(d, 0)
#> Using symbolic differentiation
#> [1] 0.1994711
density(exp(d), 1)
#> Using symbolic differentiation
#> [1] 0.1994711

Created on 2024-04-03 with reprex v2.1.0

library(distributional)
options(dist.verbose = TRUE)
# functions with custom functions in the environment
fil <- (function() {
  inv_logit <- function(x) 1 / (1 + exp(-x))
  function(x) inv_logit(x)
})()

fil_inv <- (function() {
  logit <- function(x) log(x) - log(1 - x)
  function(x) logit(x)
})()

d_fil_inv <- (function() {
  oneover <- function(x) 1 / (x * (1 - x))
  function(x) oneover(x)
})()

d <- dist_transformed(dist_logistic(0, 1), fil, fil_inv, d_fil_inv)
identical(density(d, 0.5), density(dist_uniform(0, 1), 0.5))
#> Using symbolic differentiation
#> [1] TRUE
d2 <- -log(1/d - 1)
identical(density(d2, 0), density(dist_logistic(0, 1), 0))
#> Using symbolic differentiation
#> [1] TRUE

Created on 2024-04-03 with reprex v2.1.0

I will see for the third issue. I'm not opposed to switching methods if it doesn't work, but if it also has an easy solution, I would prefer not to completely rewrite everything. The other thing I like about the current approach is that the inverse and derivative functions are human readble and easy to understand, which could be important if the package is used for pedagogical purposes

@mjskay
Copy link
Contributor

mjskay commented Apr 3, 2024

Ah okay, I see why your solution works where you are applying it currently: within Math or Ops any constants / variables from the surrounding function will have already been evaluated, so you don't have the problem of trying to piece together code with variables from multiple environments.

Where you'll run into this problem is when you implement the code for finding the inverse in dist_transformed() (which I see you have a TODO for). When you implement that code, I recommend not using the substitution technique you are using in Math and Ops. I would expect that technique to break on code like this (if d_f_inv and d_f2_inv were not provided):

f = (function() { 
  b = 2
  function(x) x * b
})()

f_inv = (function() {
  b = 2
  function(x) x / b
})()

d_f_inv = (function() {
  b = 2
  function(x) 1 / b
})()

f2 = (function() { 
  b = 3
  function(x) x * b
})()

f2_inv = (function() {
  b = 3
  function(x) x / b
})()

d_f2_inv = (function() {
  b = 3
  function(x) 1 / b
})()

d = dist_transformed(dist_normal(), f, f_inv, d_f_inv)
d2 = dist_transformed(d, f2, f2_inv, d_f2_inv)
d2
## <distribution[1]>
## [1] t(t(N(0, 1)))
density(d2, 0)
## [1] 0.06649038

Currently, this doesn't try to merge the inverses from chained transformations, so it works. This just chains together the density calculations of the two transformations, essentially doing what the autodiff approach would do.

However, if I don't pass d_f_inv and d_f2_inv above it will use numerical diff, which obviously we want to avoid.

If you implement the TODO in dist_transformed() using the same substitution approach you used elsewhere, it should break on the above example, and will also break on examples with derivatives {Deriv} can't do, as already discussed. If instead you just do the symbolic diff for the inverse as given to dist_transformed(), then I think that should be sufficient to correctly chain together derivatives when calculating the densities. I think that would solve both point 2 and 3 from my original comment.

@mjskay
Copy link
Contributor

mjskay commented Apr 3, 2024

In fact, this just made me realize that a simple way to solve the remaining issues with point 3 is, if finding the symbolic derivative of the substituted/composed inverse fails in Math / Ops, then Math / Ops could fall back to just composing two dist_transformed() objects together. That might be an easy implementation of the chain rule approach.

e.g. this doesn't use symbolic derivatives because {Deriv} can't differentiate yj:

d = dist_transformed(dist_normal(), yj$transform, yj$inverse, yj$d_inverse)
d1 = exp(d)
d1
## <distribution[1]>
## [1] t(N(0, 1))
vec_data(d1)[[1]]$deriv
## NULL

But this does, and should calculate densities correctly:

d2 = dist_transformed(d, exp, log, \(x) 1/x)
d2
## <distribution[1]>
## [1] t(t(N(0, 1)))
vec_data(d2)[[1]]$deriv
## \(x) 1/x

@venpopov
Copy link
Contributor Author

venpopov commented Apr 3, 2024

I just had the same realization after your previous comment! Let me test it

@venpopov
Copy link
Contributor Author

venpopov commented Apr 3, 2024

Oh, it works great! Actually, we can get rid of everything special and make Math.dist_default and Math.dist_transformed completely the same. Same for Ops.dist_transformed and Ops.dist_default. You end up with a nested dist_transformed object and it works.

library(distributional)
options(dist.verbose = T)
d <- dist_uniform(0, 1)
my_gumbel <- -log(-log(d))
str(my_gumbel)
#> dist [1:1] 
#> $ :List of 4
#>  ..$ dist     :List of 4
#>  .. ..$ dist     :List of 4
#>  .. .. ..$ dist     :List of 4
#>  .. .. .. ..$ dist     :List of 2
#>  .. .. .. .. ..$ l: num 0
#>  .. .. .. .. ..$ u: num 1
#>  .. .. .. .. ..- attr(*, "class")= chr [1:2] "dist_uniform" "dist_default"
#>  .. .. .. ..$ transform:function (x)  
#>  .. .. .. ..$ inverse  :function (x)  
#>  .. .. .. ..$ deriv    :function (x)  
#>  .. .. .. ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"
#>  .. .. ..$ transform:function (x)  
#>  .. .. ..$ inverse  :function (x)  
#>  .. .. ..$ deriv    :function (x)  
#>  .. .. ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"
#>  .. ..$ transform:function (x)  
#>  .. ..$ inverse  :function (x)  
#>  .. ..$ deriv    :function (x)  
#>  .. ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"
#>  ..$ transform:function (x)  
#>  ..$ inverse  :function (x)  
#>  ..$ deriv    :function (x)  
#>  ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"


x <- seq(-4, 8, 0.01)
my_dens <- density(my_gumbel, x)[[1]]
#> Using symbolic differentiation
#> Using symbolic differentiation
#> Using symbolic differentiation
#> Using symbolic differentiation

plot(x, my_dens, type = "l", col = "red", lwd = 2, xlab = "x", ylab = "Density")

all(dplyr::near(my_dens, density(dist_gumbel(0, 1), x)[[1]]))
#> [1] TRUE

Created on 2024-04-03 with reprex v2.1.0

I have to do some clean up before I push the changes, but it seems this is a super clean approach that requires very little work.

- as mentioned here: #101 (comment)
- remove methods Math.dist_transformed and Ops.dist_transformed
- now all operations use Math.dist_default, etc
@venpopov
Copy link
Contributor Author

venpopov commented Apr 4, 2024

Ok, this is pretty much done now :)

In fact, the best solution proved to be the simplest - just got rid of Math.dist_transformed() and Ops.dist_transformed(), and now everything uses the dist_default() methods. This is because they ended up being the same function. Now the logic is super simple - each transformation creates a new dist_transformed() object, to which the dist field is the previous dist_transformed() object. Each dist_transformed object only stored the transform, inverse and derivative function for one step of the transformation. Then the chain rule is appropriately applied as in the example @mjskay gave above.

I think this is the easiest and most robust solution to maintain. Thanks @mjskay for working through this with me.

Now also if the user doesn't supply a derivative, but only a transform and an inverse, we also attempt to find a symbolic one (the TODO you mentioned above), and fall back to numeric methods if that fails.

@mitchelloharawild I think this is ready for review. Maybe there are some caveats to this approach that we haven't thought about.

@mitchelloharawild
Copy link
Owner

mitchelloharawild commented Apr 4, 2024

Thanks for working on this, it'll take a little while to go through it all and review appropriately.

One thing I'm not convinced by yet is nesting the transformed distribution rather than the transformations themselves. I think it is neater to modify the transformations of a transformed distribution rather than wrap a transformed distribution with another.

e.g.

# Before
dist <- -log(-log(dist_uniform(0, 1)))
dist
#> <distribution[1]>
#> [1] t(U(0, 1))

# After
dist <- -log(-log(dist_uniform(0, 1)))
dist
#> <distribution[1]>
#> [1] t(t(t(t(U(0, 1)))))

Created on 2024-04-04 with reprex v2.0.2

This also makes it much harder to extract the parameters from the transformed distribution (the transformations), since you only get one layer of transformations in each step.

Is the main issue with composing transformations together that a mix of numerical and symbolic derivatives is preferred? Is there any other problems preventing a dist_transformed() being comprised of transformation functions and a non-transformed distribution?

@mjskay
Copy link
Contributor

mjskay commented Apr 4, 2024

Maybe you could do something like store the functions as list within a single dist_transformed(), with NULL indicating "do numerical diff at this step".

Or, if nesting the transformed distributions is the simplest and most robust implementation, you could modify other functions to make it more usable (e.g., make the print method of dist_transformed() omit the extra "t"s, and add a method to extract the innermost untransformed dist).

@venpopov
Copy link
Contributor Author

venpopov commented Apr 4, 2024

Storing the functions in a list is the alternative I also thought about, but initially discarded because I didn't know how to deal with the possibility that one derivative might fail. I like the approach you suggested with the functional fallback. It's actually quite easy to change the current method and you end up with a structure like this:

> d <- dist_uniform(0,1)
> str(-log(-log(d)))
dist [1:1] 
$ :List of 4
 ..$ dist     :List of 2
 .. ..$ l: num 0
 .. ..$ u: num 1
 .. ..- attr(*, "class")= chr [1:2] "dist_uniform" "dist_default"
 ..$ transform:List of 4
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 ..$ inverse  :List of 4
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 ..$ deriv    :List of 4
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"

Then it is just a matter of adding a utility function "apply_transform" which applies the transformation or inverse functions in a sequence. I have a working prototype of this. The derivative application would require manual calculation via the chain rule, along the lines of

  derivs <- get_deriv(dist)[[1]]
  inv <- get_inverse(dist)[[1]]
  n <- length(derivs)
  res <- derivs[[1]](value)
  if (n > 1) {
    for (i in 2:n) {
      value <- seq_apply(inv[i-1], value)
      res <- res * derivs[[i]](value)
    }
  }

If we want to store derivative functions for each step, it either has to be the nested approach above, or the list approach for all functions - the chain rule requires that we have the inverse function for each step available, so we can't compose them as was in the original code.

This will just require changing all cases where dist[['transform']](value) is used to apply_transform(dist, value), and manually computing the chain rule in the density function. Nothing too difficult. Then you end up with a single dist_transformed object as before. Thoughts?

@venpopov
Copy link
Contributor Author

venpopov commented Apr 4, 2024

The risk with storing the functions in a list is that it could be a breaking change if any package depends on the current behavior. So a safer solution might be to store the functions in a list internally, but have a wrapper function in the transform, inverse and derivs fields that applies the functions appropriately

@mitchelloharawild
Copy link
Owner

mitchelloharawild commented Apr 4, 2024

the chain rule requires that we have the inverse function for each step available

Can't the chain rule be applied incrementally (in a chain) to each successive inverse? That way we can still apply a nested approach (as is done currently with the entire distribution) but with the functions only.

The risk with storing the functions in a list is that it could be a breaking change if any package depends on the current behavior.

Not just this, but I think the usability of the package is worse with lists rather than directly usable functions. I'm not sure if anyone currently uses parameters(dist_transformed()) to pull out and use the transformations, but it would be nice to preserve that functionality.

Here's a prototype extending the example before to use the chain rule for composing transformations together.

numderiv <- function(f) {
  function(., ...) {
    vapply(., numDeriv::jacobian, numeric(1L), func = f, ...)
  }
}

symbolic_derivative <- function(inverse, fallback_numderiv = TRUE) {
  if(!fallback_numderiv) return(Deriv::Deriv(inverse, x = 'x'))
  
  tryCatch(
    Deriv::Deriv(inverse, x = 'x'),
    error = function(...) {
      numderiv(inverse)
    }
  )
}

# Chain rule
chain_rule <- function(x, y, d_x = symbolic_derivative(x), d_y = symbolic_derivative(y)) {
  function(x) d_x(y(x)) * d_y(x)
}

chain_rule(log, log)(3)
#> [1] 0.3034131
chain_rule(log, scales::transform_yj(0)$inverse)(3)
#> [1] 1.052396

# Add transform (to be incorporated in Math.dist_transformed and Ops.dist_transformed)
#' @param .x a list of functions `transform`, `inverse`, `d_inverse`
#' @param .y a list of functions `transform`, `inverse`, `d_inverse` to add to `.x`
add_transform <- function(.x, .y) {
  force(.x)
  force(.y)
  list(
    transform = function(x) .y$transform(.x$transform(x)),
    inverse = function(x) .x$inverse(.y$inverse(x)),
    d_inverse = chain_rule(.x$inverse, .y$inverse, .x$d_inverse, .y$d_inverse)
  )
}

# This is what the transformed distribution already has from earlier Math/Ops, e.g. log
dist_transform_fns <- scales::transform_log()

# Add a 2^x transformation
dist_transform_fns <- add_transform(dist_transform_fns, scales::transform_exp(2))

dist_transform_fns$transform(3)
#> [1] 2.141486
dist_transform_fns$inverse(1)
#> [1] 1
dist_transform_fns$d_inverse(3)
#> [1] 2.346355

dist_transform_fns$transform(dist_transform_fns$inverse(pi))
#> [1] 3.141593

# Add a yj(0) transformation
dist_transform_fns <- add_transform(dist_transform_fns, scales::transform_yj(0))
dist_transform_fns$transform(3)
#> [1] 1.144696
dist_transform_fns$inverse(1)
#> [1] 2.183582
dist_transform_fns$d_inverse(1)
#> [1] 4.983611

dist_transform_fns$transform(dist_transform_fns$inverse(pi))
#> [1] 3.141593

Created on 2024-04-05 with reprex v2.0.2

@venpopov
Copy link
Contributor Author

venpopov commented Apr 4, 2024

Neat! Looks like it could work . I tried to make something like that with a similar add_transform function that returns composed functions in math.dist_default but had trouble with the search environment

@mitchelloharawild
Copy link
Owner

Great. I think add_transform() should be used in Math.dist_transformed and Ops.dist_transformed since it applies transformations atop existing transformations. Something simpler can be used for Math.dist_defaul and Ops.dist_default.

@venpopov
Copy link
Contributor Author

venpopov commented Apr 6, 2024

I implemented that approach. However, I noticed that if we put add_transform() in the Math/Op methods, then if people use nested dist_transform(dist_transform(...)), as @mjskay had an example above, they would still get nested distributions. So what I settled on here is this:

  • there is no Math.dist_transformed or Ops.dist_transformed
  • we use add_transform() directly in dist_transformed

The last part was a bit tricky because dist_transformed accepts disitributional vector with potentially lists of transform and inverses, that need to be recycled, which you typically do in new_dist. So here I have these lines now:

dist_transformed <- function(dist, transform, inverse, d_inverse = NULL){
  vec_is(dist, new_dist())
  if (is.function(transform)) transform <- list(transform)
  if (is.function(inverse)) inverse <- list(inverse)
  if (is.null(d_inverse)) {
    d_inverse <- lapply(inverse, function(inv) symbolic_derivative(inv))
  } else if (is.function(d_inverse)) {
    d_inverse <- list(d_inverse)
  }

### new lines
  args <- vctrs::vec_recycle_common(dist = dist, transform = transform, inverse = inverse, d_inverse = d_inverse)
  args <- transpose(args)
  funs <- transpose(lapply(args, function(x) add_transform(x$dist, x)))
  dist <- lapply(args, function(x) if (inherits(x$dist, "dist_transformed")) x$dist$dist else x$dist)
### end of new lines

  new_dist(dist = dist,
           transform = funs$transform, inverse = funs$inverse, d_inverse = funs$d_inverse,
           dimnames = dimnames(args$dist), class = "dist_transformed")
}

Maybe there is a better way to do this here. But I do think the result of:

d <- dist_gamma(1,1)
log(d^2)

and

d1 <- dist_transformed(d, function(x) x^2, sqrt)
d2 <- dist_transformed(d, exp, log)

should be the same, which the current approach achieves.

@venpopov
Copy link
Contributor Author

venpopov commented Apr 6, 2024

So in effect, now you get what you suggested:

library(distributional)
library(vctrs)
d <- dist_uniform(0, 1)
d2 <- -log(-log(d))
d2
#> <distribution[1]>
#> [1] t(U(0, 1))
str(d2)
#> dist [1:1] 
#> $ :List of 4
#>  ..$ dist     :List of 2
#>  .. ..$ l: num 0
#>  .. ..$ u: num 1
#>  .. ..- attr(*, "class")= chr [1:2] "dist_uniform" "dist_default"
#>  ..$ transform:function (x)  
#>  ..$ inverse  :function (x)  
#>  ..$ d_inverse:function (x)  
#>  ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"

pars <- parameters(d2)
pars$transform
#> [[1]]
#> function (x) 
#> .y$transform(.x$transform(x))
#> <bytecode: 0x1327e04b0>
#> <environment: 0x110a48250>
pars$inverse
#> [[1]]
#> function (x) 
#> .x$inverse(.y$inverse(x))
#> <bytecode: 0x1327e00f8>
#> <environment: 0x110a48250>
pars$d_inverse
#> [[1]]
#> function (x) 
#> d_x(y(x)) * d_y(x)
#> <bytecode: 0x1109156c8>
#> <environment: 0x110a47ae0>

x <- seq(-4, 8, 0.01)
dens <- density(d2, x)[[1]]
plot(x, dens, type = "l", col = "red", lwd = 2, xlab = "x", ylab = "Density")

Created on 2024-04-06 with reprex v2.1.0

Copy link
Owner

@mitchelloharawild mitchelloharawild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some (incomplete) suggested changes, I'll finish reviewing the changes later.

NEWS.md Outdated Show resolved Hide resolved
R/transformed.R Show resolved Hide resolved
R/transformed.R Outdated Show resolved Hide resolved
R/transformed.R Outdated Show resolved Hide resolved
R/transformed.R Show resolved Hide resolved
Copy link
Owner

@mitchelloharawild mitchelloharawild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the mammoth effort in creating this PR and discussing the design.
One minor fix and a question, then it's good to go!

jacobian <- vapply(at, numDeriv::jacobian, numeric(1L), func = inv)
d <- density(x[["dist"]], inv(at)) * abs(jacobian)
density.dist_transformed <- function(x, at, verbose = getOption('dist.verbose', FALSE), ...) {
on.exit(options(dist.verbose = verbose), add = T)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verbose is not used in the method and I don't think it needs to be an argument of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated, I'll remove it



# creates a wrapper function around primitive functions
wrap_primitive <- function(fun) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unused?
It can stay, but I was wondering what you were thinking of using it for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i used it in a previous version, now no longer necessary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, best to remove it also then. Mainly wanted to make sure it wasn't needed for anything else that might have been forgotten 😄

@mjskay
Copy link
Contributor

mjskay commented Apr 18, 2024

Just wanted to add my thanks @venpopov for getting this done! Very exciting

@mitchelloharawild mitchelloharawild added this to the v1.0.0 milestone Apr 19, 2024
@venpopov
Copy link
Contributor Author

Looks great! Thanks for the mammoth effort in creating this PR and discussing the design. One minor fix and a question, then it's good to go!

Great, thanks for the review and help in making this work consistently. Both issues are about outdated things - I had used them before, but they are no longer necessary. I will remove them and do a final pass to see if I've forgotten anything and I'll let you know when it's ready to merge

@venpopov
Copy link
Contributor Author

A busy semester kicked in and I forgot about this... I'm leaving for a series of conferences and I'm not sure when I could do a final pass. So I would suggest you take a look if you want and merge when you are happy with the state

@mitchelloharawild
Copy link
Owner

mitchelloharawild commented Jun 13, 2024

No problem, thanks! I'm also about to be busy for conferences (one of them featuring this package! https://sched.co/1c8uE)

I'm hoping to merge this in before the conference for v1.0.0 in a few weeks time.

@venpopov venpopov closed this by deleting the head repository Dec 30, 2024
@mjskay
Copy link
Contributor

mjskay commented Jan 16, 2025

@venpopov @mitchelloharawild I saw this was closed, is there still interest here?

@venpopov if you're not able to work on it anymore I might be able to find time to push it over the line; I think this would be a great feature to have.

@venpopov
Copy link
Contributor Author

@mjskay I didn't close it on purpose, I was doing some cleanup on my repos and had forgotten this was still open when I delete my fork. I unfortunately don't have time to work on it, but it should be really close to done and I agree this would be great to have.

@venpopov
Copy link
Contributor Author

I can't restore the remote fork from github, but I have a copy on my backup drive if needed (though the code changes are still visible in the PR)

@mjskay
Copy link
Contributor

mjskay commented Jan 16, 2025

Ah makes sense. I was able to grab the code using the github CLI via gh pr checkout 101, so it's still available from the github servers.

I think you should be able to restore the fork and PR using the github CLI via the instructions here, but if not I can just open a new PR.

@venpopov
Copy link
Contributor Author

That was easy. Unfortunately GitHub still will not allow reopening the PR because "The repository that submitted the PR hass been deleted", even if the new links now lead to the "restored" fork

@mitchelloharawild
Copy link
Owner

I'm definitely still keen on symbolic solutions to these, and experimented with creating another symbolic CAS package for R (https://github.com/mitchelloharawild/symbolic) built on yacas.

In line with the low-dependency goals of the package, I'm considering bringing this in as a Suggests while a no-dependency R solution is the fallback for when the CAS package isn't installed.

The current PR is close enough to build off of, and when I get some time for it I'll finish it.

@mjskay - do you have any opinions on bringing a full CAS into the mix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants