-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow LiveViews to be adopted #3551
Comments
After dead view renders the http response we have all of the assigns. If we spend the cycles to start a LV process off the critical path of sending the response then the cost of extra work is not user facing. So when the websocket connection comes in we already have the LiveView started and the assigns are cached. This would trade a bit of memory and non-critical CPU usage for faster websocket connect. However we don't want to leak memory forever, and it's possible that the websocket never comes. So there would need to be some eviction system in place (time, memory, etc) I don't know the Erlang VM well enough, but is it possible to convince the VM to move ownership of structs (specifically assigns) rather than copying if there are no live references? That would be another way to make spinning up LV processes even cheaper. Rather than copying assigns to a new LV process, since we are using them after we have the bytes of the response ready, we can drop all references. |
@elliottneilclark there are some tricks we could do:
Outside of that, we do need to copy it, the VM cannot transfer it (RefCounting is only for large binaries). But we can spawn the process relatively early on. For example, we spawn the process immediately after the router, so none of the data mounted in the LiveView needs to be copied, only the assigns set in the plug pipeline that are accessed by the LiveView are copied (using a similar optimization as |
That makes sense; large binaries are equivalent of huge objects in JVM with different accounting. |
Our experience is that this happens only on public pages which are exposed to bots / search engines / other automatons. So in our situation:
this is exactly what we'd do.
We have some LiveViews that do some pretty heavy lifting on connected mount, so we'd need some way to guarantee that this work wouldn't be repeated if the LV was spawned on a different node to that which receives the WebSocket connection. |
I just realized that the reconnection approach has some complications. If the client crashes, LiveView doesn't know if the client has received the last message or not. So in order for reconnections to work, we would need to change LiveView server to keep a copy of all responses and only delete them when the client acknowledges it. This will definitely make the protocol chattier and perhaps affect the memory profile on the server. So for reconnection, we may want to spawn a new LiveView anyway, and then transfer the assigns, similar to This goes back to the previous argument that it may be necessary to provide different solutions for each problem, if we want to maximize their efficiency.
This is trivial to do if they are in the same node, it is a little bit trickier for distinct nodes. For distinct nodes, you would probably need to opt-in and say that a LiveView state is transferrable, which basically says that you don't rely on local ETS or resources (such as dataframes) in your LiveView state. |
One of the issues with LiveView is the double render when going from dead render to live render and the fact we lose all state on disconnection.
This issue proposes for us to render a LiveView (a Phoenix.Channel really) upfront and then it gets "adopted" when necessary. In a nutshell:
On disconnect, we keep the LiveView alive for X seconds. Then on reconnect, we reestablish the connection back to the same LiveView, and we just send the latest diff
On dead render, we already spawn the LiveView, keep it alive until the WebSocket connection arrives and "adopts" the LiveView, so we just need to send the latest diff
However, this has some issues:
If the new connection happens on the same node, it is perfect. However, if it happens on a separate node, then we can either do cluster round-trips on every payload, copy only some assigns (from
assigns_new
) or build a new LiveView altogether (and discard the old one).This solution means we will keep state around on the server for X seconds. This could perhaps be abused for DDoS attacks or similar. It may be safer to enable this only on certain pages (for example, where authentication is required) or keep the timeout short on public pages (e.g. 5 seconds instead of 30).
On the other hand, this solution should be strictly better than a cache layer for a single tab: there is zero copying and smaller payloads are sent on both connected render and reconnects. However, keep in mind this is not a cache, so it doesn't share across tabs (and luckily it does not introduce any of the caching issues, such as unbound memory usage, cache key management, etc).
There are a few challenges to implement this:
We need to add functionality for adoption first in Phoenix.Channel
We need to make sure that an orphan LiveView will submit the correct patch once it connects back. It may be we cannot squash patches on the server. We would need to queue them, which can introduce other issues
We may need an opt-in API
While this was extracted from #3482, this solution is completely orthogonal to the one outlined there, as
live_navigation
is about two different LiveViews.The text was updated successfully, but these errors were encountered: