Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pointer: Add utility functions #1012

Merged
merged 11 commits into from
Nov 30, 2020
Merged

Conversation

scolsen
Copy link
Contributor

@scolsen scolsen commented Nov 23, 2020

This commit adds a few more utility functions to Pointer.carp.

  • Pointer.set-unsafe: Sets the value of a pointer to some arbitrary Carp
    value, without type checking. Users need to ensure this operation is
    safe.
  • Pointer.set: Sets the value of a pointer to a value of type t to a
    value that has the same type.
  • Pointer.unsafe-alloc: Copies a Carp reference to a new pointer to the same
    value. This creates a leak since Carp will not automatically clean up
    this memory.
  • Pointer.free: Frees a pointer p. Users need to ensure calls to this
    function are safe and do not produce errors like a double free.
    Intended for use with leak.

Here's an example of some of these functions in action:

(defn foo []
  (let-do [p (Pointer.leak "leaky")] ;; create a new pointer
    (ignore (Pointer.set p @"foo")) ;; set the pointer to "foo"
    (println* (Pointer.to-value p)) ;; convert to a Carp val to print
    (Pointer.free p) ;; finally, free it.
    0))
(foo)
=> Compiled to 'out/Untitled' (executable)
foo
0

And the C of interest:

int foo() {
    int _35;
    /* let */ {
        static String _6 = "leaky";
        String *_6_ref = &_6;
        String* _7 = Pointer_leak__String(_6_ref);
        String* p = _7;
        /* let */ {
            static String _15 = "foo";
            String *_15_ref = &_15;
            String _16 = String_copy(_15_ref);
            String* _17 = Pointer_set__String(p, _16);
            String* _ = _17;
            /* () */
        }
        String _26 = Pointer_to_MINUS_value__String(p);
        String _27 = StringCopy_str(_26);
        String* _28 = &_27; // ref
        IO_println(_28);
        Pointer_free__String(p);
        int _34 = 0;
        _35 = _34;
        String_delete(_27);
    }
    return _35;
}

As mentioned, and as w/ other Pointer functions users need to ensure the
safety of these operations themselves. For example, calling free on
p twice in the example above produces the expected double free:

(defn foo []
  (let-do [p (Pointer.leak "leaky")] ;; create a new pointer
    (ignore (Pointer.set p @"foo")) ;; set the pointer to "foo"
    (println* (Pointer.to-value p)) ;; convert to a Carp val to print
    (Pointer.free p) ;; finally, free it.
    (Pointer.free p) ;; !Double free!
    0))
(foo)
Compiled to 'out/Untitled' (executable)
foo
Untitled(38328,0x10d9a1dc0) malloc: *** error for object 0x7feb86c01790:
pointer being freed was not allocated
Untitled(38328,0x10d9a1dc0) malloc: *** set a breakpoint in
malloc_error_break to debug
[RUNTIME ERROR] '"out/Untitled"' exited with return value -6.

Still, these should come in handy in rare cases in which users need to
circumvent the type checker or borrow checker.

This commit adds a few more utility functions to `Pointer.carp`.

- Pointer.set-unsafe: Sets the value of a pointer to some arbitrary Carp
  value, without type checking. Users need to ensure this operation is
  safe.
- Pointer.set: Sets the value of a pointer to a value of type t to a
  value that has the same type.
- Pointer.cast: Casts a pointer to a value of type t to a pointer to a
  value of type `a`--the argument passed is ignored, it is only used to
  determine the type to cast to.
- Pointer.leak: Copies a Carp reference to a new pointer to the same
  value. This creates a leak since Carp will not automatically clean up
  this memory.
- Pointer.free: Frees a pointer p. Users need to ensure calls to this
  function are safe and do not produce errors like a double free.
  Intended for use with leak.

Here's an example of some of these functions in action:

```
(defn foo []
  (let-do [p (Pointer.leak "leaky")] ;; create a new pointer
    (ignore (Pointer.set p @"foo")) ;; set the pointer to "foo"
    (println* (Pointer.to-value p)) ;; convert to a Carp val to print
    (Pointer.free p) ;; finally, free it.
    0))
(foo)
=> Compiled to 'out/Untitled' (executable)
foo
0
```

And the C of interest:

```
int foo() {
    int _35;
    /* let */ {
        static String _6 = "leaky";
        String *_6_ref = &_6;
        String* _7 = Pointer_leak__String(_6_ref);
        String* p = _7;
        /* let */ {
            static String _15 = "foo";
            String *_15_ref = &_15;
            String _16 = String_copy(_15_ref);
            String* _17 = Pointer_set__String(p, _16);
            String* _ = _17;
            /* () */
        }
        String _26 = Pointer_to_MINUS_value__String(p);
        String _27 = StringCopy_str(_26);
        String* _28 = &_27; // ref
        IO_println(_28);
        Pointer_free__String(p);
        int _34 = 0;
        _35 = _34;
        String_delete(_27);
    }
    return _35;
}
```

As mentioned, and as w/ other Pointer functions users need to ensure the
safety of these operations themselves. For example, calling `free` on
`p` twice in the example above produces the expected double free:

```
(defn foo []
  (let-do [p (Pointer.leak "leaky")] ;; create a new pointer
    (ignore (Pointer.set p @"foo")) ;; set the pointer to "foo"
    (println* (Pointer.to-value p)) ;; convert to a Carp val to print
    (Pointer.free p) ;; finally, free it.
    (Pointer.free p) ;; !Double free!
    0))
(foo)
Compiled to 'out/Untitled' (executable)
foo
Untitled(38328,0x10d9a1dc0) malloc: *** error for object 0x7feb86c01790:
pointer being freed was not allocated
Untitled(38328,0x10d9a1dc0) malloc: *** set a breakpoint in
malloc_error_break to debug
[RUNTIME ERROR] '"out/Untitled"' exited with return value -6.
```

Still, these should come in handy in rare cases in which users need to
circumvent the type checker or borrow checker.

diff --git a/core/Pointer.carp b/core/Pointer.carp
index a662c636..4f29d587 100644
--- a/core/Pointer.carp
+++ b/core/Pointer.carp
@@ -20,6 +20,29 @@ The user will have to ensure themselves that this is a safe operation.")
   (doc from-long "converts a long integer to a pointer.")
   (deftemplate from-long (Fn [Long] (Ptr p)) "$p* $NAME(Long p)" " $DECL { return ($p*)p; }")

+  (doc set-unsafe
+    "Sets the value of a pointer."
+    "The user will have to ensure this operation is safe.")
+  (deftemplate set-unsafe (Fn [(Ptr p) (Ref a b)] (Ptr p)) "$p* $NAME($p* p, void* a)" "$DECL { *p = *($p*)a; return p;}")
+
+  (doc cast
+     "Cast a pointer to type p to a pointer to type a."
+     "The value of the `a` argument is ignored.")
+  (deftemplate cast (Fn [(Ptr p) a] (Ptr a)) "$a* $NAME($p* p, $a a)" "$DECL { *($a*)p = CARP_MALLOC(sizeof($a)); return ($a*)p;}")
+
+  (doc leak
+     "Allocate a new pointer that's a copy of the value of `Ref`"
+     "The Carp borrow checker won't delete this pointer. You will need to delete it manually by calling `Pointer.free`.")
+  (deftemplate leak (Fn [(Ref a b)] (Ptr a)) "$a* $NAME($a* r)" "$DECL { void *leak = CARP_MALLOC(sizeof($a)); memcpy(leak, r, sizeof($a)); return ($a*)leak;}")
+
+  (doc free
+     "Free a pointer."
+     "Users need to manually verify that this operation is safe.")
+  (deftemplate free (Fn [(Ptr p)] Unit) "void $NAME($p* p)" "$DECL {CARP_FREE(p);}")
+
+  (doc set "Sets the value of a pointer.")
+  (deftemplate set (Fn [(Ptr p) p] (Ptr p)) "$p* $NAME($p* p, $p a)" "$DECL { *p = a; return p;}")
+
   (defn inc [a] (Pointer.add a 1l))
   (implements inc Pointer.inc)
   (defn dec [a] (Pointer.sub a 1l))
@scolsen scolsen requested a review from a team November 23, 2020 22:30
(doc cast
"Cast a pointer to type p to a pointer to type a."
"The value of the `a` argument is ignored.")
(deftemplate cast (Fn [(Ptr p) a] (Ptr a)) "$a* $NAME($p* p, $a a)" "$DECL { *($a*)p = CARP_MALLOC(sizeof($a)); return ($a*)p;}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's this one useful? Can you present a use case?

Copy link
Contributor

@TimDeve TimDeve Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is enough, no?

(deftemplate cast (Fn [(Ptr p)] (Ptr a)) "$a* $NAME($p* p)" "$DECL { return ($a*)p; }")

Use it like so:

(defn main []
  (println* (+ 30
               (Pointer.to-value
                (cast (address (Uint32.from-long 12l)))))))

Edit: I think I got what this function is trying to do, see other comment.

@TimDeve
Copy link
Contributor

TimDeve commented Nov 23, 2020

This looks very similar to the Box thing I was working on, I think these functions could be useful but I'm not sure some of the name make sense. I think understand why you want cast and leak to allocate, because you want to be able to use them as containers on the heap but I don't think it's clear from the function name.

The name and the docs imply that they would behave differently, I think a function called cast should only cast the type and leak should take ownership and do nothing.

(doc set-unsafe
"Sets the value of a pointer."
"The user will have to ensure this operation is safe.")
(deftemplate set-unsafe (Fn [(Ptr p) (Ref a b)] (Ptr p)) "$p* $NAME($p* p, void* a)" "$DECL { *p = *($p*)a; return p;}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pretty much written the same function for my side project so I see the usefulness in this but I'm not sure why this doesn't take ownership?

@scolsen
Copy link
Contributor Author

scolsen commented Nov 24, 2020

This looks very similar to the Box thing I was working on, I think these functions could be useful but I'm not sure some of the name make sense. I think understand why you want cast and leak to allocate, because you want to be able to use them as containers on the heap but I don't think it's clear from the function name.

The name and the docs imply that they would behave differently, I think a function called cast should only cast the type and leak should take ownership and do nothing.

Thanks Tim! I think you're spot on here--probably not allocating is also what I want here--I'll toy around w/ these changes tomorrow.

How far along are you with your Box project? Would that just obviate the need for these utils?

@TimDeve
Copy link
Contributor

TimDeve commented Nov 24, 2020

Not incredibly far, it was more a "Hey, I wonder if that works" thing. My thinking was to generate a box for each type using a macro to keep the box type safe, but I don't think it's gonna be super useful until we can implement delete for Opaque types.

I do feel like the Box type stuff should stay in a lib until we've figured out how plug them into the delete lifecycle, but it's just a gut feel.

@hellerve
Copy link
Member

I feel like Pointer.cast is strictly a subset of Unsafe.coerce—except for the copy. I think we can still keep it if we want, but could implement it on top of that functionality.

@jacereda
Copy link
Contributor

I feel like Pointer.cast is strictly a subset of Unsafe.coerce—except for the copy. I think we can still keep it if we want, but could implement it on top of that functionality.

But there's no copy AFAICS...

@hellerve
Copy link
Member

Right, copy is not right, I just meant the allocation.

Instead of `leak` copying a previously allocated value, it now takes
(unmanaged) ownership of a fresh value and allocates. This makes more
sense semantically, as we're just instantiating a new pointer that won't
be managed by Carp and will leak unless freed explicitly.

Thanks to @TimDeve for the suggestion!
- Rename set-unsafe to align w/ naming conventions
  Most unsafe functions are prefixed with `unsafe`, not suffixed.
- Rename leak to `unsafe-alloc` to better convey its semantics (leak
  also already exists as `Unsafe.leak`.
- Remove `cast` since its use is covered by `Unsafe.coerce`.

Thanks to TimDeve and hellerve for the suggestions!
Here's a short illustration of why we need `realloc` even though we
already have `Pointer.add`:

```
(defn foo []
  (let-do [p (Pointer.unsafe-alloc 2)]
    (set! p (Pointer.add p (Pointer.width (Pointer.unsafe-alloc @"foo"))))
    (ignore (Pointer.unsafe-set p @"foo"))
    (println* (Pointer.to-value (the (Ptr String) (Unsafe.coerce p))))
    (Pointer.free p)
    0))
```

This function seems fine at first glance, but since `add` returns a new
pointer, `p` is reset to the new pointer, the reference to the original
is lost, and `free` is called on a value that was never actually
allocated since `add` does not malloc.

Using unsafe-realloc, we can avoid the additional allocation:

```
(defn foo []
  (let-do [p (Pointer.unsafe-alloc 2)]
    (Pointer.unsafe-realloc p @"foo")
    (ignore (Pointer.unsafe-set p @"foo"))
    (println* (Pointer.to-value (the (Ptr String) (Unsafe.coerce p))))
    (Pointer.free p)
    0))
```

The allocation is what we care about here. One still needs to use
`Unsafe.coerce` since as far as the Carp compiler is concerned, `p` is
still a (Ptr Int) even though the corresponding c has cast it silently
to a `String` in order to reallocate.
@scolsen
Copy link
Contributor Author

scolsen commented Nov 24, 2020

I feel like Pointer.cast is strictly a subset of Unsafe.coerce—except for the copy. I think we can still keep it if we want, but could implement it on top of that functionality.

So, I've actually restored cast but renamed it to unsafe-realloc. The allocation in the template matters since it is (afaik) the only way to reallocate the same reference--one still needs to use it in conjunction with coerce to convince the Carp compiler of its real type:

(defn foo []                                                                                                                                                                                                                                                                    
  (let-do [p (Pointer.unsafe-alloc 2)]                                                                                                                                                                                                                                          
    (Pointer.unsafe-realloc p @"foo")                                                                                                                                                                                                                                           
    (ignore (Pointer.unsafe-set p @"foo"))                                                                                                                                                                                                                                      
    (println* (Pointer.to-value p))                                                                                                                                                                                                                                             
    (Pointer.free p)                                                                                                                                                                                                                                                            
    0))

Without the realloc here, we'd need to allocate a fresh pointer to resize p to fit foo:

(defn foo []                                                                                                                                                                                                                                                                    
  (let-do [p (Pointer.unsafe-alloc 2)]                                                                                                                                                                                                                                          
    (set! p (Pointer.add p (Pointer.width (Pointer.unsafe-alloc @"foo"))))                                                                                                                                                                                                      
    (ignore (Pointer.unsafe-set p @"foo"))                                                                                                                                                                                                                                      
    (println* (Pointer.to-value (the (Ptr String) (Unsafe.coerce p))))                                                                                                                                                                                                          
    (Pointer.free p)                                                                                                                                                                                                                                                            
    0))     

Though there may be another way I'm just not seeing.

@jacereda
Copy link
Contributor

jacereda commented Nov 24, 2020

(defn foo []                                                                                                                                                                                                                                                                    
  (let-do [p (Pointer.unsafe-alloc 2)]                                                                                                                                                                                                                                          
    (Pointer.unsafe-realloc p @"foo")                                                                                                                                                                                                                                           
    (ignore (Pointer.unsafe-set p @"foo"))                                                                                                                                                                                                                                      
    (println* (Pointer.to-value p))                                                                                                                                                                                                                                             
    (Pointer.free p)                                                                                                                                                                                                                                                            
    0))

But unsafe-realloc isn't reallocating, it's just overwriting the old pointer without freeing it, so this is just leaking.

Can you share a real use case for this? Are you wrapping some library maybe? Perhaps there's a better way...

@scolsen
Copy link
Contributor Author

scolsen commented Nov 24, 2020

(defn foo []                                                                                                                                                                                                                                                                    
  (let-do [p (Pointer.unsafe-alloc 2)]                                                                                                                                                                                                                                          
    (Pointer.unsafe-realloc p @"foo")                                                                                                                                                                                                                                           
    (ignore (Pointer.unsafe-set p @"foo"))                                                                                                                                                                                                                                      
    (println* (Pointer.to-value p))                                                                                                                                                                                                                                             
    (Pointer.free p)                                                                                                                                                                                                                                                            
    0))

But unsafe-realloc isn't reallocating, it's just overwriting the old pointer without freeing it, so this is just leaking.

Can you share a real use case for this? Are you wrapping some library maybe? Perhaps there's a better way...

tbh, I don't have a great use case for this other than to try and circumvent a current limitation around lambdas w/o allocating that much, so maybe we don't need this after all

Esp. if the reassignment causes a leak!

@scolsen
Copy link
Contributor Author

scolsen commented Nov 24, 2020

@carp-lang/maintainers for reference, here's the real-world use-case that motivated these functions:

(defn bi-lift [f applicative-a applicative-b]                                                                                                                                                                                                                            
    ;; The obvious definition using partials doesn't currently work.                                                                                                                                                                                                       
    ;; (sequence (fmap &(binary-partial binary-partial f) cx) cy))                                                                                                                                                                                                         
    (let [lam (fn [x] (fn [y] (let-do [res (~f (Pointer.to-value x) y)] (Pointer.free x) res)))]                                                                                                                                                                           
      (sequence (lift &lam (fmap &Pointer.unsafe-alloc applicative-a)) applicative-b)))

Without Pointer.unsafe-alloc the only other way to make this work is by creating a global def that is dynamically assigned to temporary values and to replace those values w/ pointers to global defs. That works too, but entails a lot of additional machinery.

For whatever reason, using the obvious definition of the above (with nest lambdas and no pointers, the commented out code) results in a double free on string arguments:

鲤 (Applicative.bi-lift &(fn [x y] (append &x &y)) (Maybe.Just @"Foo") (Maybe.Just @"bar"))
Compiled to 'out/Untitled' (executable)
Untitled(22965,0x10be32dc0) malloc: *** error for object 0x7fd7b14017a0: pointer being freed was not allocated
Untitled(22965,0x10be32dc0) malloc: *** set a breakpoint in malloc_error_break to debug
[RUNTIME ERROR] '"out/Untitled"' exited with return value -6.

However, if we cast x to a pointer, we can avoid this:

鲤 (Applicative.bi-lift &(fn [x y] (append &x &y)) (Maybe.Just @"Foo") (Maybe.Just @"bar"))
Compiled to 'out/Untitled' (executable)
(Just @"Foobar")
=> 0

Here's what basically happens, I think. Foo is managed in the maybe--when it enters the first lambda that takes ownership, but then the second inner lambda that's returned an referring to x just sets its local x to x and also takes ownership. However, the delete from the outer lambda is still emitted, resulting in something like:

delete(_12)
delete(_14) // but _12 === _14 -- double free

The reason this pointer trick works is that Pointer.unsafe-alloc makes the value passed to the lambda (the pointer) unmanaged, thus the delete isn't emitted.

I'm not sure what the cause of the underlying double free is; we have issue #997 to track it.

@scolsen
Copy link
Contributor Author

scolsen commented Nov 24, 2020

Here's the key difference, the lambda captures f only, instead of f and x:

Maybe__String _49;                                                                                                                                                                                                                                                     
    /* let */ {                                                                                                                                                                                                                                                            
        // This lambda captures 1 variables: f                                                                                                                                                                                                                             
        Applicative__Lambda_bi_MINUS_lift__String_String_String_33_env *_33_env = CARP_MALLOC(sizeof(Applicative__Lambda_bi_MINUS_lift__String_String_String_33_env));                                                                                                     
        _33_env->f = f;          

And x is not managed:

void Applicative__Lambda_bi_MINUS_lift__String_String_String_32_env_delete(Applicative__Lambda_bi_MINUS_lift__String_String_String_32_env* p) {                                                                                                                            
    /* Ignore non-managed member 'f' : (Ref (Fn [String, String] String) <StaticLifetime>) */                                                                                                                                                                              
    /* Ignore non-managed member 'x' : (Ptr String) */                                                                                                                                                                                                                     
}                  

Presumably there's a way to make this work w/ refs as well--but I couldn't get it to work. I guess this means theres a small error in our lambda capture handling?

Copy link
Contributor

@TimDeve TimDeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from my comment nitpicks, I think it make sense to have these escape hatches in core.

core/Pointer.carp Outdated Show resolved Hide resolved
core/Pointer.carp Outdated Show resolved Hide resolved
Copy link
Contributor

@TimDeve TimDeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jacereda jacereda mentioned this pull request Nov 24, 2020
@eriksvedang
Copy link
Collaborator

eriksvedang commented Nov 25, 2020

To summarize, in its current form this PR adds four functions to the Pointer module:

  • unsafe-set
  • unsafe-alloc
  • free
  • set

These do look good and potentially useful to me, and does not seem to overlap with what we already have there:

Pointer : Module = {
    add : (Fn [(Ptr p), Long] (Ptr p))
    copy : (Fn [(Ref (Ptr p) q)] (Ptr p))
    dec : (Fn [(Ptr a)] (Ptr a))
    eq : (Fn [(Ptr p), (Ptr p)] Bool)
    from-long : (Fn [Long] (Ptr p))
    inc : (Fn [(Ptr a)] (Ptr a))
    prn : (Fn [(Ptr a)] String)
    str : (Fn [(Ptr a)] String)
    sub : (Fn [(Ptr p), Long] (Ptr p))
    to-long : (Fn [(Ptr p)] Long)
    to-ref : (Fn [(Ptr p)] (Ref p tyvar-from-info-0_7_37))
    to-value : (Fn [(Ptr p)] p)
    width : (Fn [(Ptr p)] Long)
}

There's already a function System.free though. Perhaps that one should be removed at the same time?

I don't think we should use these functions to work around limitations/bugs in the language in the long run – but of course it's good to have escape hatches while those bugs are being fixed!

These things are also very helpful when wrapping tricky C libraries.

Finally, I'd want to know what @jacereda thinks about the PR now that it has changed so much.

@scolsen
Copy link
Contributor Author

scolsen commented Nov 25, 2020

Ah, I wasn't aware of System.free! We could probably drop Pointer.free in favor of using System.free, though I do suppose it might be nice to have it in Pointer, since in my opinion it's more obvious to reach for Pointer.free if one is working w/ Pointers. But maybe we can get away with just defining it as an alias specialized to pointers instead of using the current deftemplate? (defn Pointer.free [p] (System.free p))

@eriksvedang
Copy link
Collaborator

@scolsen Yeah, let's remove System.free, I like it better here in Pointer. The exact implementation doesn't really matter... I like the one using deftemplate that you have here.

@scolsen
Copy link
Contributor Author

scolsen commented Nov 28, 2020

@scolsen Yeah, let's remove System.free, I like it better here in Pointer. The exact implementation doesn't really matter... I like the one using deftemplate that you have here.

Think we should change Pointer.free to take any object like the existing System.free does?--or should we keep it restricted to pointers and assume we can handle that type of scenario w/ custom deleters or casting to pointers via address--it's less convenient but I think the current System.free should be equivalent to (Pointer.free (address foo))

@eriksvedang
Copy link
Collaborator

I like restricting it to Pointer:s, it doesn't make sense for all types.

Pointer.free serves a similar function, and is more restrictive, so
we'll remove System.free. One can use `delete` or cast to a pointer and
free that way.See PR carp-lang#1012 for further discussion.
@scolsen
Copy link
Contributor Author

scolsen commented Nov 30, 2020

Sounds good! I just removed System.free--hopefully that didn't break anything in the tests 😅 if not I think we should be good. I'm pretty excited about some of the possibilities w/ Pointer.set--it can be used for stateful closures! #1036

@eriksvedang eriksvedang merged commit 892a972 into carp-lang:master Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants