Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support force activated pipelines #144

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chatziko
Copy link

@chatziko chatziko commented Mar 27, 2024

This PR contains the client-side of a "push-to-talk"-like feature that allows the server to "force activate" the satellite, going directly to ASR. The change is simple: when the server sends run-satellite with start_stage = asr then we send back a run-satellite (as usual) with that stage and start streaming immediately. More details will be given in the corresponding PR in home-assistant/core.

#143 is recommended for this to work properly, otherwise the awake sound and debug recording will not be triggered for "force activated" pipelines. I made separate PRs for easier review, if you merge #143 first I can take care of the conflicts.

Note also that this wyoming change is needed by this PR.

Finally note that this is about a server-side "push-to-talk", not client side as in #82.

@founderio
Copy link

I feel like a remote-activated function like this needs a config option to enable/disable it, ideally defaulting to OFF. This feels like a privacy issue waiting to happen.

@chatziko
Copy link
Author

It could be easily made optional. But I'm not sure what scenario you have in mind, who is the adversary?
If your HA server is compromised you should have no privacy expectation at all, the server controls the satellites and pretty much everything else.

@founderio
Copy link

In general I'm leaning towards the "prank" level of privacy, so someone with access to the network.

However, from my understanding so far, there is no authentication/authorization in the wyoming protocol? (Please correct me if I'm wrong, but I did not find any reference that shows any kind of authentication happening)

This means, by extension, that the satellite could be controlled by someone who is NOT the HA server. Which means anyone in the network could trigger audio streaming - bypassing voice/hotword activation - to basically anywhere in the network.

@chatziko
Copy link
Author

This is an interesting security discussion in general, maybe encryption/authentication could be added to wyoming (similarly to esphome).

But it's completely orthogonal to this PR. Even without this PR a malicious user on the local network can connect to the satellite and stream audio. If wake-word detection happens locally on the satellite the malicious user would simply need to wait until the first detection, and then he could stream audio forever (pretending the pipeline never ends).

Or, to avoid waiting, he could even do a more exotic attack like send an audio-chunk to the satellite containing playback audio of the wake word (a wav file saying "ok nabu"), which the satellite will happily playback (thinking it's a TTS response) and wake itself up 😄

@founderio
Copy link

Thank you for this insight! I was not aware the protocol allowed that much freedom :D

I've filed an issue in the protocol repo here, as this looks like a more foundational topic: rhasspy/wyoming#11

@llluis
Copy link
Contributor

llluis commented Apr 3, 2024

Hey @chatziko, very interesting approach. Thanks for the contributiuon.
We did the same feature, but differently. :)

Don't know if you saw this one, it's a very long thread, but you can go bottom-up: #81

I did that and a few other "exploration" changes, however, PRs are not going through.
I don't want to create a parallel repo, so I'm waiting on those to be merged or rejected to continue adding new features. :(

@chatziko
Copy link
Author

chatziko commented Apr 4, 2024

Hey @llluis , wow, that's a long discussion in #81, I certainly missed it otherwise I would have participated. I hadn't followed wyoming-satellite for a while, I only posted a couple of issues back when it was first released then kept using the old homeassistant-satellite until a couple of weeks ago. Then I checked the PRs for recent developments, but not the issues (apart from #4 in which I saw no action). But I'm very happy to find out that people are pushing interesting ideas forward!

I had a quick look at the code in llluis/wyoming-satellite, if I understand correctly the general design for the "remote trigger" is that the server sends a "fake" detect event. It's certainly a simple and natural design. The reason I didn't follow this approach is that I thought it has a few drawbacks:

  1. Just handling detect in the usual way doesn't work with VadStreamingSatellite, cause streaming is triggered by VAD, not detect. So it needs a different implementation there, complicating the code.

  2. Conceptually, "faking" an event is a bit messy. detect has specific semantics and can happen at specific moments in the pipeline. Adding new semantics and allowing it to happen at any moment will make the code more complicated. (And then adding the question_id param which has nothing to do with a real detection makes things even more "hacky".)

  3. Sending detect will not work if the satellite is paused, cause the server is not connected (in my use-case I want to pause the satellite, eg when the TV is on, and then be able to trigger it while paused). So you'd need to unpause first and coordinate the detect afterwards, which again would complicate the code.

So in the end I found the approach of adding start_stage to run-satellite cleaner, but of course it's not a huge difference and in the end any approach that works is fine. Hopefully @synesthesiam will find some time soon to have a look and let us know what he thinks and whether he plans to merge just a feature.

PS. Concerning question_id, if I understood correctly your goal was to only do ASR? This sounds like an interesting goal (although I find the question_id solution messy). In my design one could easily achieve this by also adding end_stage to run-satellite (fits well with start_stage).

@Mincka
Copy link

Mincka commented May 15, 2024

Very nice PRs. I've tested them, including the button in HA.
Everything works fine, including the awake sound which is kept when the button.press service is called.

Thank you very much. I sincerely hope that this work can be reviewed and merged in the future.

I wrote a small guide there for anyone interested to test and use it in their setup.

@chatziko
Copy link
Author

Awesome, thanks for testing and writing a guide.

@Mincka
Copy link

Mincka commented May 18, 2024

Not sure at that point if that's related to one of the PR, but the button entity was named button.my_satellite_none, and now I see that after removing the integration and adding the satellite again that none of the entities has a name related to their setting.

image

select.my_satellite_none
select.my_satellite_none_2
switch.my_satellite_none
number.my_satellite_none
number.my_satellite_none_2

@Mincka
Copy link

Mincka commented May 28, 2024

Do you know if this activation can be associated with one of the current events like --detection-command?
I tried to use it but it does not seem to be called when using the button press.
My goal is to keep a chime or send a tts to let the user know that the satellite is ready to listen.

@chatziko
Copy link
Author

It should work, but of course with the events that are actually happening. No wakeword was detected so it makes sense that --detection--command does not fire. But --stt-start-command should work (haven't tested it though).

@Mincka
Copy link

Mincka commented May 28, 2024

I think I've tested with all the events around the detection without success, but I'll try again with that one and report here for the results. It could also an issue in the original project so I'll test with the standard workflow with the wake word. Thank you for the quick answer.

@Mincka
Copy link

Mincka commented Jun 3, 2024

Quite sad that rhasspy/wyoming#10 did not make it to 1.5.4 since it was ready for a long time.
However, it's good to see a little bit of progress around this. Now the requirements for this PR are fulfilled. :)

@qJake
Copy link

qJake commented Jul 6, 2024

Can't wait for this! I'm really hoping that after this is implemented, we can tweak the LLM call to include a "Follow up?" parameter which can just trigger an immediate detection event on the satellite so it can start listening right away.

Basically, this gets us one step closer to true two-way, multi-step communication with LLMs. Very exciting.

@slyticoon
Copy link

Excited for this to be implemented. This would allow me to completely replace my commercial voice assistants. Actionable notification and complex conversations will be great to have.

@jjdenhertog
Copy link

@chatziko I noticed that changes to the core are made referencing this pr. But it's not fully integrated. Is the current code still usable? Would love to see this working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants