Use automatic or user-supplied symbolic derivatives for the density of transformed distributions #101

venpopov · 2024-04-02T20:01:02Z

Summary

As discussed in #97, this PR implements automatic symbolic differentiation of the inverse transformation function. It also allows users to supply a deriv function to dist_transformed(), in addition to the transform and inverse functions, which will be used instead of numeric derivatives in computing the density.

Details on the implementation

Deriv package

I looked into different packages for symbolic differentiation and decided to use the Deriv::Deriv package instead of stats::deriv for the following reasons:

it can compute derivatives directly from function calls rather than needing an expression
It works recursively through nested functions to build derivatives from the component basic functions. stats::deriv fails with custom functions even if they are composed of functions that are in its lookup-table. Using Deriv::Deriv ends up working in all cases I tried, even with a very complicated set of nested custom functions (see the test here for example)
It is a lightweight dependency, importing only methods
it allows adding custom rules to its lookup table. For now I found now need, but could be useful for the future.
It outputs a function rather than an expression, which can be directly provided to the density method

Constructing the derivative functions

The symbolic derivative function is constructed only once when the dist_transformed object gets created. It is save in a field deriv of the dist object just like transform and inverse
This happens in 4 places:
- 1. Math.dist_default() - for the first transformation with a unary function such as log(dist_gamma(1,1)
- 1. Ops.dist_default() - for the first transformation with a binary operator such as dist_gamma(1,1)^2
- 1. Math.dist_default() - for subsequent unary transformations
- 1. Ops.dist_default() - for subsequent binary operator transformations

In 1 and 2, the Deriv::Deriv is applied to the initial inverse function. E.g:

  trans <- new_function(exprs(x = ), body = expr((!!sym(.Generic))(x, !!!dots_list(...))))

  inverse <- get_unary_inverse(.Generic, ...)
  inverse <- Deriv::Simplify(inverse)

  deriv <- suppressWarnings(try(Deriv::Deriv(inverse, x = 'x'), silent = TRUE))

In 3 and 4, it is applied to the compounded inverse function from the multiple transformations:

  # replace x in the previous inverse function body with the current inverse function body
  prev_inverse <- x$inverse
  current_inverse <- get_unary_inverse(.Generic, ...)
  body(prev_inverse) <- substituteDirect(body(prev_inverse), list(x = body(current_inverse)))
  inverse <- Deriv::Simplify(prev_inverse)

  deriv <- suppressWarnings(try(Deriv::Deriv(inverse, x = 'x'), silent = TRUE))

This happens only for transformations that are applied directly via myfun(dist_gamma(1,1)). If a user uses dist_transformed(), they can provide a function to the new deriv argument, and this will be used (@mjskay)

Calculating the derivatives for the density

Now that the derivative function is available in the dist object, the actual calculation happens in density.dist_transformed(). It has a conditional statement that checks if the derivative function is available, and if the new argument deriv_method is symbolic (default) or numeric. It calculates the derivative with either the old numeric method, or symbolicaly, and the rest is the same.

Changes to the construction of the inverses

To make the above work, I had to change how the inverses are constructed. The previous syntax created functions with many (function(x) {})(x) type calls, which did not work with the Deriv package. This was the trickiest part. The previous syntax worked even thought there were the same constant terms in different nested functions. There are two components to my changes:

Changes to get_*_inverse_* inverse functions to substitute the constants or optional arguments in the function body, and to return a wrapped primitive function:

get_unary_inverse <- function(f, ...) {
  switch(f,
         sqrt = function(x) x^2,
         exp = function(x) log(x),
         log = (function(x, base) {
           if (missing(base)) function(x) exp(x)
           else new_function(exprs(x = ), expr((!!base)^x))
         })(x, ...),
         log2 = function(x) 2^x,
         log10 = function(x) 10^x,
         expm1 = function(x) log1p(x),
         log1p = function(x) expm1(x),
         cos = function(x) acos(x),
         sin = function(x) asin(x),
         tan = function(x) atan(x),
         acos = function(x) cos(x),
         asin = function(x) sin(x),
         atan = function(x) tan(x),
         cosh = function(x) acosh(x),
         sinh = function(x) asinh(x),
         tanh = function(x) atanh(x),
         acosh = function(x) cosh(x),
         asinh = function(x) sinh(x),
         atanh = function(x) tanh(x),

         invert_fail
  )
}

get_binary_inverse_1 <- function(f, constant) {

  switch(f,
         `+` = new_function(exprs(x = ), body = expr(x - !!constant)),
         `-` = new_function(exprs(x = ), body = expr(x + !!constant)),
         `*` = new_function(exprs(x = ), body = expr(x / !!constant)),
         `/` = new_function(exprs(x = ), body = expr(x * !!constant)),
         `^` = new_function(exprs(x = ), body = expr(x ^ (1/!!constant))),

         invert_fail
  )
}

When repreated transformations are applied, such as 1-3*exp(-exp(5+dist_gamma(1,1)^2)), at each stage the body of the new inverse function is injected to replace the x symbol in the body of the previous inverse function. E.g. in the above example we get the following inverse functions at each step:

sqrt(x) -> sqrt(x-5) -> sqrt(log(x)-5) -> sqrt(log(-x)-5) -> sqrt(log(-log(x))-5) -> sqrt(log(-log(x/3))-5) ....

This makes it very easy for Deriv::Deriv to find a symbolic derivative, and so far I didn't find any combination of functions in the inverse functions tables that failed. I added extensive new tests to ensure that the inverses and derivatives match the know analytic versions. This is quite stable now.

Speed

Aside from numerical stability, the other benefit is much improved speed, because we find the derivative only when constructing the object, and afterwards the operation is completely vectorized.

builtin <- dist_lognormal(0, 1)
trans <- exp(dist_wrap('norm'))
x <- seq(1, 10, 0.001)

microbenchmark::microbenchmark(
  builtin = density(builtin, x)[[1]],
  trans_numeric = density(trans, x, deriv_method = "numeric")[[1]],
  trans_symbolic = density(trans, x, deriv_method = "symbolic")[[1]],
  check = "equal"
)

#> Unit: microseconds
#>            expr        min          lq        mean      median          uq        max neval
#>         builtin    113.488    126.7105    142.1835    137.6985    156.1895    191.839   100
#>   trans_numeric 267743.981 273165.0215 281941.3901 276509.5760 279855.7500 554079.822   100
#>  trans_symbolic    291.838    331.1570    349.4364    342.9035    361.5790    510.696   100

Examples

Here are some examples of the resulting inverse and derivative functions

Getting a gumbel distribution from a uniform:

library(distributional)
d <- dist_uniform(0, 1)
my_gumbel <- -log(-log(d))
default_gumbel <- dist_gumbel(0, 1)

fget(my_gumbel, inverse)
#> function (x) 
#> exp(-exp(x/-1))
#> <environment: 0x103a766b0>
fget(my_gumbel, deriv)
#> function (x) 
#> {
#>     .e2 <- exp(-x)
#>     exp(-.e2) * .e2
#> }
#> <environment: 0x103a766b0>

x <- seq(-4, 8, 0.01)
my_dens <- density(my_gumbel, x, verbose = TRUE)[[1]]
#> Using symbolic differentiation
def_dens <- density(default_gumbel, x)[[1]]

plot(x, my_dens, type = "l", col = "red", lwd = 2, xlab = "x", ylab = "Density")
lines(x, def_dens, col = "blue", lwd = 2)

^{Created on 2024-04-02 with reprex v2.1.0}

From a shifted and rescale logistic to log-logistic, rescale, back to standard-logistic

With inverses printed in blue and derivatives printed in red. You can see that the functions get simplified if possible, which is quite neat.

loc = 1
scale = 0.3
alpha = exp(loc)
beta = 1 / scale
d1 <- dist_logistic(loc, scale)
d2 <- exp(d1)
d3 <- d2^beta / alpha^beta
d4 <- log(d3)
data.frame(dist = c(d1,d2,d3,d4),
           labels = c('d1','d2','d3','d4'),
           inverse = sapply(fget(c(d1,d2,d3,d4),inverse), function(x) deparse(body(x))),
           deriv = sapply(fget(c(d1,d2,d3,d4),deriv), function(x) deparse(body(x)))) %>% 
ggplot() + 
  stat_slab(aes(xdist = dist, y = labels), limits = c(-8,8), normalize = "groups") +
  theme_ggdist() +
  geom_text(aes(label = inverse, y = labels), x = -5, vjust = -5, col = "blue") +
  geom_text(aes(label = deriv, y = labels), x = 5, vjust = -5, col = "red")

A pretty crazy transformation but it works

So I tried to push it as much as I could to break it with the following crazy transformation involving multiple custom nested functions, but it worked, and the derivative matches the output of WolframAlpha

### -------- a super complicated final case --------------
  # nested custom functions involved in further nested transformation
  inv_logit <- function(x) 1/(1 + exp(-x))
  coshsquared_inv_logit <- function(x) cosh(inv_logit(x))^2
  dist2 <- exp(2-1/log(2+coshsquared_inv_logit(d), base = 10)^(0.4+0.1))

  # get random values within the support and test the inverse
  lim <- field(support(dist2), 'lim')[[1]]
  v <- runif(10, lim[1], lim[2])
  res <- exp(2-1/log(2+coshsquared_inv_logit(v), base = 10)^(0.4+0.1))
  expect_equal(vec_data(dist2)[[1]]$inverse(res), v)

  # monstrous analytical derivative based on wolfram alpha
  an_deriv <- function(x) {
    term1 <- (log(10) * 10^(1/((log(x) - 2)^2)))
    term2 <- x * sqrt(10^(1/((log(x) - 2)^2)) - 3)
    term3 <- sqrt(10^(1/((log(x) - 2)^2)) - 2)
    term4 <- (log(x) - 2)^3
    term1 / (term2 * term3 * term4 * (acosh(term3) - 1) * acosh(term3))
  }

  all.equal(fget(dist2, deriv)(v), an_deriv(v))
#> TRUE

fget(dist2, inverse)
#> function (x) 
#> -log(1/acosh(sqrt(10^(1/(2 - log(x)))^2 - 2)) - 1)


 fget(dist2, deriv)
#> function (x) 
#> {
#>     .e1 <- 2 - log(x)
#>     .e3 <- 10^(1/.e1)^2
#>     .e5 <- sqrt(.e3 - 2)
#>     .e6 <- acosh(.e5)
#>     2.30258509299405 * (.e3/(x * (1/.e6 - 1) * .e1^3 * .e6^2 * 
#>         .e5 * sqrt(.e3 - 3)))
}

- works completely if only one transformation is applied

…ive functions from a distribution object - makes it much easier to test for me than having to type vec_data(dist)[[1]]$transform

venpopov · 2024-04-02T20:07:36Z

btw, I added a small utility function fget to avoid having to type vec_data(dists)[[1]]$inverse to get functions while I worked on this. I find it useful, but if you'd prefer not to have it in, I can remove it ;)

mjskay · 2024-04-03T00:22:11Z

Awesome!!

I have a couple of thoughts:

If the symbolic derivative fails, can it fall back silently to numerical differentiation, instead of printing a message? Or at least, can there be an option for such a mode? Currently it prints a message and then fails later if I call density(). From the {ggdist} side I would not want this message cluttering output when plotting, and I would want it to always still attempt the transformation later.
I worry that the approach of trying to compose a single inverse will be brittle for some complex functions. R semantics are so weird that I could see situations where if someone has done something with a function definition you don't expect, dropping its code into the middle of another function could fail in unexpected ways (e.g. what happens if a function has been constructed dynamically in a surrounding environment that contains some constants the function is using? If you drop the function body into the middle of another function, I don't think the constants would be evaluated in the correct environment).
Building on the last point, the approach of composing a single inverse then taking the derivative will not work in cases where {Deriv} can do derivatives of some of the inverses but not all of them, even if the user has supplied the derivatives {Deriv} cannot do.

For example, {Deriv} cannot find the derivative of the inverse of the Yeo-Johnson transformation:
```
yj = scales::transform_yj(0)
Deriv::Deriv(yj$inverse)
## Error in FUN(X[[i]], ...) : Could not retrieve body of 'is.na()'
```
However, since {scales} provides the derivative and the new dist_transformed() now accepts it, we can still construct a transformed distribution with it using the known derivative (great!):
```
d = dist_transformed(dist_normal(), yj$transform, yj$inverse, yj$d_inverse)
density(d, 0)
## [1] 0.3989423
```
But if I try to apply a further transform, it will not work, since it will try to differentiate the entire inverse, which {Deriv} can't do:
```
neg_d = -d
## Cannot compute the derivative of the inverse function symbolicly.
density(neg_d, 0)
## Error in x[["deriv"]](at) : attempt to apply non-function
```
However, using an autodiff type approach, you can find a derivative of a chain of functions that you know the derivatives of by applying the chain rule. This is what {scales} does to find derivatives of compositions of transforms; see compose_deriv_fwd() and compose_deriv_rev() here.

This approach has two advantages: (1) you don't need to derive the composed inverse function (avoiding any corner cases created by R's weird semantics when substituting code) and (2) you can find the derivatives of chains of functions so long as either the user or {Deriv} knows the derivative of each function in the chain. This second property is particularly important for {ggdist}, because if someone applies a scale transformation (where we know the derivatives) to an already-transformed distribution, I'd like to be able to use dist_transformed() on the existing distribution to create a new distribution that can calculate densities without using numerical diff.

Thanks again for working on this, I'm really looking forward to having more reliable distribution transformation for ggdist...

venpopov · 2024-04-03T15:06:22Z

Thanks for the detailed and thoughtful feedback. Indeed, we would want it to be as robust as possible. Point 1 is easy and I thought it already falls back to the numeric derivative if it fails, but I'll check. For point 2, could you give me an example function for which the current approach will fail so that I add this as a test case and see whether any alternative approach works with it?

venpopov · 2024-04-03T16:15:02Z

Ok, you are right about weird environment issues with substituting the inverses. Actually your example with scales showcases this if we skip the derivative part. This works on the master branch:

library(distributional)
yj = scales::transform_yj(0)

d = dist_transformed(dist_normal(2), yj$transform, yj$inverse)
density(d, 1)
#> [1] 1.042247

neg_d = -d
density(neg_d, 1)
#> [1] 0.005514935

^{Created on 2024-04-03 with reprex v2.1.0}

But it fails on the PR branch:

library(distributional)
yj = scales::transform_yj(0)

d = dist_transformed(dist_normal(2), yj$transform, yj$inverse)
density(d, 1, deriv_method = "numeric")
#> [1] 1.042247

neg_d = -d
#> Cannot compute the derivative of the inverse function symbolicly.
density(neg_d, 1, deriv_method = "numeric")
#> Error in trans_two_sided(x/-1, inv_pos, inv_neg): could not find function "trans_two_sided"

^{Created on 2024-04-03 with reprex v2.1.0}

mjskay · 2024-04-03T16:39:37Z

Yeah, that pattern is used in a number of the transformations defined in {scales}. A simple example if you need a test case:

f = (function() { 
  b = 2
  function(x) x * b
})()

f_inv = (function() {
  b = 2
  function(x) x / b
})()

d_f_inv = (function() {
  b = 2
  function(x) 1 / b
})()

d = dist_transformed(dist_normal(), f, f_inv, d_f_inv)
density(d, 0)
## [1] 0.1994711
density(exp(d), 0)
## Error in x[["deriv"]](at) : object 'b' not found

I think it's not fixable in general using the code substitution approach. The autodiff approach should work on these though.

venpopov · 2024-04-03T16:49:04Z

that's perfect, thanks, it's good to have cases like this in the automatic tests anyway. You may be right, but let me see first if there is an easy solution

mjskay · 2024-04-03T17:12:36Z

that's perfect, thanks, it's good to have cases like this in the automatic tests anyway. You may be right, but let me see first if there is an easy solution

Fair enough :). FWIW I do not think code substitution can work in the general case because solving the problem requires different pieces of code to be executed in different environments. The usual ways to solve that problem are either functions or quosures, and I assume {Deriv} doesn't support quosures.

- if symbolic derivative fails, fallback to numeric - print message that symbolic failed only if options(dist.verbose = TRUE) - save environment of inverse function into new inverse during transformations

venpopov · 2024-04-03T18:16:28Z

Ok, turned out 1 and 2 were both easy fixes.

If the symbolic derivative fails, can it fall back silently to numerical differentiation, instead of printing a message? Or at least, can there be an option for such a mode? Currently it prints a message and then fails later if I call density(). From the {ggdist} side I would not want this message cluttering output when plotting, and I would want it to always still attempt the transformation later.

Now it will only print the message if options(dist.verbose = TRUE) and it will fall back to numeric derivatives

I worry that the approach of trying to compose a single inverse will be brittle for some complex functions. R semantics are so weird that I could see situations where if someone has done something with a function definition you don't expect, dropping its code into the middle of another function could fail in unexpected ways (e.g. what happens if a function has been constructed dynamically in a surrounding environment that contains some constants the function is using? If you drop the function body into the middle of another function, I don't think the constants would be evaluated in the correct environment).

It was just a matter of specifyin the new function's environment to match the previous inverse's environment:

  body(prev_inverse) <- substituteDirect(body(prev_inverse), list(x = body(current_inverse)))
  inverse <- Deriv::Simplify(prev_inverse, env = environment(prev_inverse))

    body <- substituteDirect(body(prev_inverse), list(x = body(current_inverse)))
    inverse <- new_function(exprs(x = ), body = body, env = environment(prev_inverse))

Note that we only need the environment of the previous inverse, for the new inverse, the existing get_*_inverse functions retrieve what is necessary correctly.

Now all of these examples work (I turned on verbose mode to see the messages, but they will not be printed by default):

library(distributional)
options(dist.verbose = TRUE)
yj = scales::transform_yj(0)

d = dist_transformed(dist_normal(2), yj$transform, yj$inverse, yj$d_inverse)
density(d, 1)
#> Using symbolic differentiation
#> [1] 1.042247

neg_d = -d
#> Cannot compute the derivative of the inverse function symbolicly.
density(neg_d, 1)
#> Using numerical differentiation.
#> [1] 0.005514935

^{Created on 2024-04-03 with reprex v2.1.0}

library(distributional)
options(dist.verbose = TRUE)
f = (function() { 
  b = 2
  function(x) x * b
})()

f_inv = (function() {
  b = 2
  function(x) x / b
})()

d_f_inv = (function() {
  b = 2
  function(x) 1 / b
})()

d = dist_transformed(dist_normal(), f, f_inv, d_f_inv)
density(d, 0)
#> Using symbolic differentiation
#> [1] 0.1994711
density(exp(d), 1)
#> Using symbolic differentiation
#> [1] 0.1994711

^{Created on 2024-04-03 with reprex v2.1.0}

library(distributional)
options(dist.verbose = TRUE)
# functions with custom functions in the environment
fil <- (function() {
  inv_logit <- function(x) 1 / (1 + exp(-x))
  function(x) inv_logit(x)
})()

fil_inv <- (function() {
  logit <- function(x) log(x) - log(1 - x)
  function(x) logit(x)
})()

d_fil_inv <- (function() {
  oneover <- function(x) 1 / (x * (1 - x))
  function(x) oneover(x)
})()

d <- dist_transformed(dist_logistic(0, 1), fil, fil_inv, d_fil_inv)
identical(density(d, 0.5), density(dist_uniform(0, 1), 0.5))
#> Using symbolic differentiation
#> [1] TRUE
d2 <- -log(1/d - 1)
identical(density(d2, 0), density(dist_logistic(0, 1), 0))
#> Using symbolic differentiation
#> [1] TRUE

^{Created on 2024-04-03 with reprex v2.1.0}

I will see for the third issue. I'm not opposed to switching methods if it doesn't work, but if it also has an easy solution, I would prefer not to completely rewrite everything. The other thing I like about the current approach is that the inverse and derivative functions are human readble and easy to understand, which could be important if the package is used for pedagogical purposes

mjskay · 2024-04-03T19:02:50Z

Ah okay, I see why your solution works where you are applying it currently: within Math or Ops any constants / variables from the surrounding function will have already been evaluated, so you don't have the problem of trying to piece together code with variables from multiple environments.

Where you'll run into this problem is when you implement the code for finding the inverse in dist_transformed() (which I see you have a TODO for). When you implement that code, I recommend not using the substitution technique you are using in Math and Ops. I would expect that technique to break on code like this (if d_f_inv and d_f2_inv were not provided):

f = (function() { 
  b = 2
  function(x) x * b
})()

f_inv = (function() {
  b = 2
  function(x) x / b
})()

d_f_inv = (function() {
  b = 2
  function(x) 1 / b
})()

f2 = (function() { 
  b = 3
  function(x) x * b
})()

f2_inv = (function() {
  b = 3
  function(x) x / b
})()

d_f2_inv = (function() {
  b = 3
  function(x) 1 / b
})()

d = dist_transformed(dist_normal(), f, f_inv, d_f_inv)
d2 = dist_transformed(d, f2, f2_inv, d_f2_inv)
d2
## <distribution[1]>
## [1] t(t(N(0, 1)))
density(d2, 0)
## [1] 0.06649038

Currently, this doesn't try to merge the inverses from chained transformations, so it works. This just chains together the density calculations of the two transformations, essentially doing what the autodiff approach would do.

However, if I don't pass d_f_inv and d_f2_inv above it will use numerical diff, which obviously we want to avoid.

If you implement the TODO in dist_transformed() using the same substitution approach you used elsewhere, it should break on the above example, and will also break on examples with derivatives {Deriv} can't do, as already discussed. If instead you just do the symbolic diff for the inverse as given to dist_transformed(), then I think that should be sufficient to correctly chain together derivatives when calculating the densities. I think that would solve both point 2 and 3 from my original comment.

mjskay · 2024-04-03T19:15:47Z

In fact, this just made me realize that a simple way to solve the remaining issues with point 3 is, if finding the symbolic derivative of the substituted/composed inverse fails in Math / Ops, then Math / Ops could fall back to just composing two dist_transformed() objects together. That might be an easy implementation of the chain rule approach.

e.g. this doesn't use symbolic derivatives because {Deriv} can't differentiate yj:

d = dist_transformed(dist_normal(), yj$transform, yj$inverse, yj$d_inverse)
d1 = exp(d)
d1
## <distribution[1]>
## [1] t(N(0, 1))
vec_data(d1)[[1]]$deriv
## NULL

But this does, and should calculate densities correctly:

d2 = dist_transformed(d, exp, log, \(x) 1/x)
d2
## <distribution[1]>
## [1] t(t(N(0, 1)))
vec_data(d2)[[1]]$deriv
## \(x) 1/x

venpopov · 2024-04-03T19:17:52Z

I just had the same realization after your previous comment! Let me test it

venpopov · 2024-04-03T20:31:37Z

Oh, it works great! Actually, we can get rid of everything special and make Math.dist_default and Math.dist_transformed completely the same. Same for Ops.dist_transformed and Ops.dist_default. You end up with a nested dist_transformed object and it works.

library(distributional)
options(dist.verbose = T)
d <- dist_uniform(0, 1)
my_gumbel <- -log(-log(d))
str(my_gumbel)
#> dist [1:1] 
#> $ :List of 4
#>  ..$ dist     :List of 4
#>  .. ..$ dist     :List of 4
#>  .. .. ..$ dist     :List of 4
#>  .. .. .. ..$ dist     :List of 2
#>  .. .. .. .. ..$ l: num 0
#>  .. .. .. .. ..$ u: num 1
#>  .. .. .. .. ..- attr(*, "class")= chr [1:2] "dist_uniform" "dist_default"
#>  .. .. .. ..$ transform:function (x)  
#>  .. .. .. ..$ inverse  :function (x)  
#>  .. .. .. ..$ deriv    :function (x)  
#>  .. .. .. ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"
#>  .. .. ..$ transform:function (x)  
#>  .. .. ..$ inverse  :function (x)  
#>  .. .. ..$ deriv    :function (x)  
#>  .. .. ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"
#>  .. ..$ transform:function (x)  
#>  .. ..$ inverse  :function (x)  
#>  .. ..$ deriv    :function (x)  
#>  .. ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"
#>  ..$ transform:function (x)  
#>  ..$ inverse  :function (x)  
#>  ..$ deriv    :function (x)  
#>  ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"


x <- seq(-4, 8, 0.01)
my_dens <- density(my_gumbel, x)[[1]]
#> Using symbolic differentiation
#> Using symbolic differentiation
#> Using symbolic differentiation
#> Using symbolic differentiation

plot(x, my_dens, type = "l", col = "red", lwd = 2, xlab = "x", ylab = "Density")

all(dplyr::near(my_dens, density(dist_gumbel(0, 1), x)[[1]]))
#> [1] TRUE

^{Created on 2024-04-03 with reprex v2.1.0}

I have to do some clean up before I push the changes, but it seems this is a super clean approach that requires very little work.

- as mentioned here: #101 (comment) - remove methods Math.dist_transformed and Ops.dist_transformed - now all operations use Math.dist_default, etc

venpopov · 2024-04-04T04:02:50Z

Ok, this is pretty much done now :)

In fact, the best solution proved to be the simplest - just got rid of Math.dist_transformed() and Ops.dist_transformed(), and now everything uses the dist_default() methods. This is because they ended up being the same function. Now the logic is super simple - each transformation creates a new dist_transformed() object, to which the dist field is the previous dist_transformed() object. Each dist_transformed object only stored the transform, inverse and derivative function for one step of the transformation. Then the chain rule is appropriately applied as in the example @mjskay gave above.

I think this is the easiest and most robust solution to maintain. Thanks @mjskay for working through this with me.

Now also if the user doesn't supply a derivative, but only a transform and an inverse, we also attempt to find a symbolic one (the TODO you mentioned above), and fall back to numeric methods if that fails.

@mitchelloharawild I think this is ready for review. Maybe there are some caveats to this approach that we haven't thought about.

mitchelloharawild · 2024-04-04T04:57:33Z

Thanks for working on this, it'll take a little while to go through it all and review appropriately.

One thing I'm not convinced by yet is nesting the transformed distribution rather than the transformations themselves. I think it is neater to modify the transformations of a transformed distribution rather than wrap a transformed distribution with another.

e.g.

# Before
dist <- -log(-log(dist_uniform(0, 1)))
dist
#> <distribution[1]>
#> [1] t(U(0, 1))

# After
dist <- -log(-log(dist_uniform(0, 1)))
dist
#> <distribution[1]>
#> [1] t(t(t(t(U(0, 1)))))

^{Created on 2024-04-04 with reprex v2.0.2}

This also makes it much harder to extract the parameters from the transformed distribution (the transformations), since you only get one layer of transformations in each step.

Is the main issue with composing transformations together that a mix of numerical and symbolic derivatives is preferred? Is there any other problems preventing a dist_transformed() being comprised of transformation functions and a non-transformed distribution?

mjskay · 2024-04-04T05:24:40Z

Maybe you could do something like store the functions as list within a single dist_transformed(), with NULL indicating "do numerical diff at this step".

Or, if nesting the transformed distributions is the simplest and most robust implementation, you could modify other functions to make it more usable (e.g., make the print method of dist_transformed() omit the extra "t"s, and add a method to extract the innermost untransformed dist).

venpopov · 2024-04-04T13:27:43Z

Storing the functions in a list is the alternative I also thought about, but initially discarded because I didn't know how to deal with the possibility that one derivative might fail. I like the approach you suggested with the functional fallback. It's actually quite easy to change the current method and you end up with a structure like this:

> d <- dist_uniform(0,1)
> str(-log(-log(d)))
dist [1:1] 
$ :List of 4
 ..$ dist     :List of 2
 .. ..$ l: num 0
 .. ..$ u: num 1
 .. ..- attr(*, "class")= chr [1:2] "dist_uniform" "dist_default"
 ..$ transform:List of 4
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 ..$ inverse  :List of 4
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 ..$ deriv    :List of 4
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 .. ..$ :function (x)  
 ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"

Then it is just a matter of adding a utility function "apply_transform" which applies the transformation or inverse functions in a sequence. I have a working prototype of this. The derivative application would require manual calculation via the chain rule, along the lines of

  derivs <- get_deriv(dist)[[1]]
  inv <- get_inverse(dist)[[1]]
  n <- length(derivs)
  res <- derivs[[1]](value)
  if (n > 1) {
    for (i in 2:n) {
      value <- seq_apply(inv[i-1], value)
      res <- res * derivs[[i]](value)
    }
  }

If we want to store derivative functions for each step, it either has to be the nested approach above, or the list approach for all functions - the chain rule requires that we have the inverse function for each step available, so we can't compose them as was in the original code.

This will just require changing all cases where dist[['transform']](value) is used to apply_transform(dist, value), and manually computing the chain rule in the density function. Nothing too difficult. Then you end up with a single dist_transformed object as before. Thoughts?

venpopov · 2024-04-04T13:39:46Z

The risk with storing the functions in a list is that it could be a breaking change if any package depends on the current behavior. So a safer solution might be to store the functions in a list internally, but have a wrapper function in the transform, inverse and derivs fields that applies the functions appropriately

mitchelloharawild · 2024-04-04T22:23:41Z

the chain rule requires that we have the inverse function for each step available

Can't the chain rule be applied incrementally (in a chain) to each successive inverse? That way we can still apply a nested approach (as is done currently with the entire distribution) but with the functions only.

The risk with storing the functions in a list is that it could be a breaking change if any package depends on the current behavior.

Not just this, but I think the usability of the package is worse with lists rather than directly usable functions. I'm not sure if anyone currently uses parameters(dist_transformed()) to pull out and use the transformations, but it would be nice to preserve that functionality.

Here's a prototype extending the example before to use the chain rule for composing transformations together.

numderiv <- function(f) {
  function(., ...) {
    vapply(., numDeriv::jacobian, numeric(1L), func = f, ...)
  }
}

symbolic_derivative <- function(inverse, fallback_numderiv = TRUE) {
  if(!fallback_numderiv) return(Deriv::Deriv(inverse, x = 'x'))
  
  tryCatch(
    Deriv::Deriv(inverse, x = 'x'),
    error = function(...) {
      numderiv(inverse)
    }
  )
}

# Chain rule
chain_rule <- function(x, y, d_x = symbolic_derivative(x), d_y = symbolic_derivative(y)) {
  function(x) d_x(y(x)) * d_y(x)
}

chain_rule(log, log)(3)
#> [1] 0.3034131
chain_rule(log, scales::transform_yj(0)$inverse)(3)
#> [1] 1.052396

# Add transform (to be incorporated in Math.dist_transformed and Ops.dist_transformed)
#' @param .x a list of functions `transform`, `inverse`, `d_inverse`
#' @param .y a list of functions `transform`, `inverse`, `d_inverse` to add to `.x`
add_transform <- function(.x, .y) {
  force(.x)
  force(.y)
  list(
    transform = function(x) .y$transform(.x$transform(x)),
    inverse = function(x) .x$inverse(.y$inverse(x)),
    d_inverse = chain_rule(.x$inverse, .y$inverse, .x$d_inverse, .y$d_inverse)
  )
}

# This is what the transformed distribution already has from earlier Math/Ops, e.g. log
dist_transform_fns <- scales::transform_log()

# Add a 2^x transformation
dist_transform_fns <- add_transform(dist_transform_fns, scales::transform_exp(2))

dist_transform_fns$transform(3)
#> [1] 2.141486
dist_transform_fns$inverse(1)
#> [1] 1
dist_transform_fns$d_inverse(3)
#> [1] 2.346355

dist_transform_fns$transform(dist_transform_fns$inverse(pi))
#> [1] 3.141593

# Add a yj(0) transformation
dist_transform_fns <- add_transform(dist_transform_fns, scales::transform_yj(0))
dist_transform_fns$transform(3)
#> [1] 1.144696
dist_transform_fns$inverse(1)
#> [1] 2.183582
dist_transform_fns$d_inverse(1)
#> [1] 4.983611

dist_transform_fns$transform(dist_transform_fns$inverse(pi))
#> [1] 3.141593

^{Created on 2024-04-05 with reprex v2.0.2}

venpopov · 2024-04-04T22:38:24Z

Neat! Looks like it could work . I tried to make something like that with a similar add_transform function that returns composed functions in math.dist_default but had trouble with the search environment

mitchelloharawild · 2024-04-04T22:49:18Z

Great. I think add_transform() should be used in Math.dist_transformed and Ops.dist_transformed since it applies transformations atop existing transformations. Something simpler can be used for Math.dist_defaul and Ops.dist_default.

venpopov · 2024-04-06T18:28:53Z

I implemented that approach. However, I noticed that if we put add_transform() in the Math/Op methods, then if people use nested dist_transform(dist_transform(...)), as @mjskay had an example above, they would still get nested distributions. So what I settled on here is this:

there is no Math.dist_transformed or Ops.dist_transformed
we use add_transform() directly in dist_transformed

The last part was a bit tricky because dist_transformed accepts disitributional vector with potentially lists of transform and inverses, that need to be recycled, which you typically do in new_dist. So here I have these lines now:

dist_transformed <- function(dist, transform, inverse, d_inverse = NULL){
  vec_is(dist, new_dist())
  if (is.function(transform)) transform <- list(transform)
  if (is.function(inverse)) inverse <- list(inverse)
  if (is.null(d_inverse)) {
    d_inverse <- lapply(inverse, function(inv) symbolic_derivative(inv))
  } else if (is.function(d_inverse)) {
    d_inverse <- list(d_inverse)
  }

### new lines
  args <- vctrs::vec_recycle_common(dist = dist, transform = transform, inverse = inverse, d_inverse = d_inverse)
  args <- transpose(args)
  funs <- transpose(lapply(args, function(x) add_transform(x$dist, x)))
  dist <- lapply(args, function(x) if (inherits(x$dist, "dist_transformed")) x$dist$dist else x$dist)
### end of new lines

  new_dist(dist = dist,
           transform = funs$transform, inverse = funs$inverse, d_inverse = funs$d_inverse,
           dimnames = dimnames(args$dist), class = "dist_transformed")
}

Maybe there is a better way to do this here. But I do think the result of:

d <- dist_gamma(1,1)
log(d^2)

and

d1 <- dist_transformed(d, function(x) x^2, sqrt)
d2 <- dist_transformed(d, exp, log)

should be the same, which the current approach achieves.

venpopov · 2024-04-06T18:34:52Z

So in effect, now you get what you suggested:

library(distributional)
library(vctrs)
d <- dist_uniform(0, 1)
d2 <- -log(-log(d))
d2
#> <distribution[1]>
#> [1] t(U(0, 1))
str(d2)
#> dist [1:1] 
#> $ :List of 4
#>  ..$ dist     :List of 2
#>  .. ..$ l: num 0
#>  .. ..$ u: num 1
#>  .. ..- attr(*, "class")= chr [1:2] "dist_uniform" "dist_default"
#>  ..$ transform:function (x)  
#>  ..$ inverse  :function (x)  
#>  ..$ d_inverse:function (x)  
#>  ..- attr(*, "class")= chr [1:2] "dist_transformed" "dist_default"

pars <- parameters(d2)
pars$transform
#> [[1]]
#> function (x) 
#> .y$transform(.x$transform(x))
#> <bytecode: 0x1327e04b0>
#> <environment: 0x110a48250>
pars$inverse
#> [[1]]
#> function (x) 
#> .x$inverse(.y$inverse(x))
#> <bytecode: 0x1327e00f8>
#> <environment: 0x110a48250>
pars$d_inverse
#> [[1]]
#> function (x) 
#> d_x(y(x)) * d_y(x)
#> <bytecode: 0x1109156c8>
#> <environment: 0x110a47ae0>

x <- seq(-4, 8, 0.01)
dens <- density(d2, x)[[1]]
plot(x, dens, type = "l", col = "red", lwd = 2, xlab = "x", ylab = "Density")

^{Created on 2024-04-06 with reprex v2.1.0}

mitchelloharawild

Some (incomplete) suggested changes, I'll finish reviewing the changes later.

NEWS.md

R/transformed.R

#101 (review)

mitchelloharawild

Looks great! Thanks for the mammoth effort in creating this PR and discussing the design.
One minor fix and a question, then it's good to go!

mitchelloharawild · 2024-04-18T08:56:43Z

R/transformed.R

-  jacobian <- vapply(at, numDeriv::jacobian, numeric(1L), func = inv)
-  d <- density(x[["dist"]], inv(at)) * abs(jacobian)
+density.dist_transformed <- function(x, at, verbose = getOption('dist.verbose', FALSE), ...) {
+  on.exit(options(dist.verbose = verbose), add = T)


verbose is not used in the method and I don't think it needs to be an argument of this function.

outdated, I'll remove it

mitchelloharawild · 2024-04-18T09:01:59Z

R/utils.R

+
+
+# creates a wrapper function around primitive functions
+wrap_primitive <- function(fun) {


This seems to be unused?
It can stay, but I was wondering what you were thinking of using it for.

i used it in a previous version, now no longer necessary

Thanks, best to remove it also then. Mainly wanted to make sure it wasn't needed for anything else that might have been forgotten 😄

mjskay · 2024-04-18T15:43:07Z

Just wanted to add my thanks @venpopov for getting this done! Very exciting

venpopov · 2024-04-19T07:03:43Z

Looks great! Thanks for the mammoth effort in creating this PR and discussing the design. One minor fix and a question, then it's good to go!

Great, thanks for the review and help in making this work consistently. Both issues are about outdated things - I had used them before, but they are no longer necessary. I will remove them and do a final pass to see if I've forgotten anything and I'll let you know when it's ready to merge

venpopov · 2024-06-13T18:26:27Z

A busy semester kicked in and I forgot about this... I'm leaving for a series of conferences and I'm not sure when I could do a final pass. So I would suggest you take a look if you want and merge when you are happy with the state

mitchelloharawild · 2024-06-13T23:06:48Z

No problem, thanks! I'm also about to be busy for conferences (one of them featuring this package! https://sched.co/1c8uE)

I'm hoping to merge this in before the conference for v1.0.0 in a few weeks time.

mjskay · 2025-01-16T18:28:22Z

@venpopov @mitchelloharawild I saw this was closed, is there still interest here?

@venpopov if you're not able to work on it anymore I might be able to find time to push it over the line; I think this would be a great feature to have.

venpopov · 2025-01-16T19:13:13Z

@mjskay I didn't close it on purpose, I was doing some cleanup on my repos and had forgotten this was still open when I delete my fork. I unfortunately don't have time to work on it, but it should be really close to done and I agree this would be great to have.

venpopov · 2025-01-16T19:18:00Z

I can't restore the remote fork from github, but I have a copy on my backup drive if needed (though the code changes are still visible in the PR)

mjskay · 2025-01-16T23:34:05Z

Ah makes sense. I was able to grab the code using the github CLI via gh pr checkout 101, so it's still available from the github servers.

I think you should be able to restore the fork and PR using the github CLI via the instructions here, but if not I can just open a new PR.

venpopov · 2025-01-16T23:52:35Z

That was easy. Unfortunately GitHub still will not allow reopening the PR because "The repository that submitted the PR hass been deleted", even if the new links now lead to the "restored" fork

mitchelloharawild · 2025-01-17T01:52:39Z

I'm definitely still keen on symbolic solutions to these, and experimented with creating another symbolic CAS package for R (https://github.com/mitchelloharawild/symbolic) built on yacas.

In line with the low-dependency goals of the package, I'm considering bringing this in as a Suggests while a no-dependency R solution is the fallback for when the CAS package isn't installed.

The current PR is close enough to build off of, and when I get some time for it I'll finish it.

@mjskay - do you have any opinions on bringing a full CAS into the mix?

venpopov added 12 commits April 2, 2024 09:20

first transformation symbolic derivatives

6d46bfa

- works completely if only one transformation is applied

nested symbolic differentiation

1008c48

add local test folder to .gitignore

57b3f79

improve reliability of inverses and symbolic diff

29f52b3

add comprehensive tests and cleanup

2ed3df3

add test for custom derivatives

93e8119

abstract duplicate behavior into a new substitute function

c5c73b1

add utility function for extracting the transform, inverse or derivat…

84606e1

…ive functions from a distribution object - makes it much easier to test for me than having to type vec_data(dist)[[1]]$transform

simplify inverse method

e118c6a

ensure that primitive user-supplied functions work

c007bce

update news

a020486

minor cleanup

df4b74b

venpopov mentioned this pull request Apr 2, 2024

do not limit range of brakes in subguide_axis mjskay/ggdist#219

Closed

fixes related to #101

b933c69

- if symbolic derivative fails, fallback to numeric - print message that symbolic failed only if options(dist.verbose = TRUE) - save environment of inverse function into new inverse during transformations

drastic simplification of derivative/inverse procedure

0ddb870

- as mentioned here: #101 (comment) - remove methods Math.dist_transformed and Ops.dist_transformed - now all operations use Math.dist_default, etc

venpopov mentioned this pull request Apr 4, 2024

add more extensive tests #103

Open

mjskay mentioned this pull request Apr 5, 2024

Offload finding jacobians to distributional mjskay/ggdist#222

Open

venpopov added 6 commits April 5, 2024 13:12

Merge branch 'master' into symbolic-or-user-derivatives

ecfd549

switch to approach #101 (comment)

036627f

fix merge conflict

d3cef8c

remove methods import

8da7140

remove methods SubstituteDirect import

97de44f

update namespace

15eb038

mitchelloharawild requested changes Apr 8, 2024

View reviewed changes

NEWS.md Outdated Show resolved Hide resolved

R/transformed.R Show resolved Hide resolved

R/transformed.R Outdated Show resolved Hide resolved

R/transformed.R Outdated Show resolved Hide resolved

R/transformed.R Show resolved Hide resolved

address requested changes

7bc5e65

#101 (review)

mitchelloharawild approved these changes Apr 18, 2024

View reviewed changes

mitchelloharawild added this to the v1.0.0 milestone Apr 19, 2024

venpopov closed this by deleting the head repository Dec 30, 2024



		# creates a wrapper function around primitive functions
		wrap_primitive <- function(fun) {

Use automatic or user-supplied symbolic derivatives for the density of transformed distributions #101

Use automatic or user-supplied symbolic derivatives for the density of transformed distributions #101

Conversation

venpopov commented Apr 2, 2024

Summary

Details on the implementation

Deriv package

Constructing the derivative functions

Calculating the derivatives for the density

Changes to the construction of the inverses

Speed

Examples

Getting a gumbel distribution from a uniform:

From a shifted and rescale logistic to log-logistic, rescale, back to standard-logistic

A pretty crazy transformation but it works

venpopov commented Apr 2, 2024

mjskay commented Apr 3, 2024

venpopov commented Apr 3, 2024

venpopov commented Apr 3, 2024

mjskay commented Apr 3, 2024 • edited Loading

venpopov commented Apr 3, 2024

mjskay commented Apr 3, 2024

venpopov commented Apr 3, 2024 • edited Loading

mjskay commented Apr 3, 2024 • edited Loading

mjskay commented Apr 3, 2024 • edited Loading

venpopov commented Apr 3, 2024 • edited Loading

venpopov commented Apr 3, 2024 • edited Loading

venpopov commented Apr 4, 2024

mitchelloharawild commented Apr 4, 2024 • edited Loading

mjskay commented Apr 4, 2024

venpopov commented Apr 4, 2024

venpopov commented Apr 4, 2024

mitchelloharawild commented Apr 4, 2024 • edited Loading

venpopov commented Apr 4, 2024

mitchelloharawild commented Apr 4, 2024

venpopov commented Apr 6, 2024

venpopov commented Apr 6, 2024

mitchelloharawild left a comment

Choose a reason for hiding this comment

mitchelloharawild left a comment

Choose a reason for hiding this comment

mitchelloharawild Apr 18, 2024

Choose a reason for hiding this comment

venpopov Apr 19, 2024

Choose a reason for hiding this comment

mitchelloharawild Apr 18, 2024

Choose a reason for hiding this comment

venpopov Apr 19, 2024

Choose a reason for hiding this comment

mitchelloharawild Apr 19, 2024

Choose a reason for hiding this comment

mjskay commented Apr 18, 2024

venpopov commented Apr 19, 2024

venpopov commented Jun 13, 2024

mitchelloharawild commented Jun 13, 2024 • edited Loading

mjskay commented Jan 16, 2025

venpopov commented Jan 16, 2025

venpopov commented Jan 16, 2025

mjskay commented Jan 16, 2025

venpopov commented Jan 16, 2025

mitchelloharawild commented Jan 17, 2025

mjskay commented Apr 3, 2024 •

edited

Loading

venpopov commented Apr 3, 2024 •

edited

Loading

mjskay commented Apr 3, 2024 •

edited

Loading

mjskay commented Apr 3, 2024 •

edited

Loading

venpopov commented Apr 3, 2024 •

edited

Loading

venpopov commented Apr 3, 2024 •

edited

Loading

mitchelloharawild commented Apr 4, 2024 •

edited

Loading

mitchelloharawild commented Apr 4, 2024 •

edited

Loading

mitchelloharawild commented Jun 13, 2024 •

edited

Loading