MMM-GoogleAssistant in Server mode

martinhood

Hello,
I’m wondering how Google Assistant module (and other voice activated modules also) are supposed to work with a server only setup.
Is it possible to have audio input and output on the client, and let them communicate with the module in the backend?
I see there are modules that can stream audio to the client (browser) using HTML5, can this be done for MMM-GoogleAssistant Module?
My goal is to have a MagicMirror istance running on a remote server in the cloud (no phisical audio), and a browser running on android box at client side (home).

Thank you in advance

sdetweil

@martinhood i have never seen the audio side remote voice reco work

snowboy (hotword detector), assumes mic is local, uses node_record_lpcm16 which invokes arecord or rec to collect the audio stream

on some implementations, once the hotword is detected the audio stream is forwarded thru api to a remote speech reco system (google speech) which streams back the words detected, as a transaction

word
word word
word word word
done word word word

then snowboy is restarted, logically, to wait for the hotword and repeat all of that

the snowboy team has abandonded their project and no one has taken their place.
https://github.com/Kitt-AI/snowboy

snowboy hotword detection is pretty much the gold standard

martinhood

Thanks for your feedback.
So, if I understand this correctly, there’s no way to make any snowboy-based module work with remote audio, at the moment. Right?

sdetweil

@martinhood as we speak, correct.

i have not seen a local mic, stream audio to a remote word detector and voice reco engine

martinhood

@sdetweil that’s a pity :) Thank you

sdetweil

@martinhood ive working on these kinds of systems now for 5 years, never seen anyone attempt it

martinhood

I understand the standard deploy for a Magic Mirror is a all-in-one device, but since a “server mode” exists… should be reasonable to think about a 2-tier setup.
In my specific case I have an old Android Lollipop box I would like to repurpose as a Magic Mirror frontend (it is small and can run a browser, no way to put another OS on it), and I’ve got plenty of horsepower on a remote hypervisor (on which I deployed a Debian VM with MM on it)

sdetweil

@martinhood i didnt say it wasnt a good dream!

martinhood

yeah :) I’ll let you know in case the dream would turn true :)

martinhood

@sdetweil I finally managed to make it work:

On client side (old Android Lollipop box) I am using RtpMic app: it is good, simple and effective. It can start at boot and start streaming mic input via RTP.
I configured it with remote MagicMirror IP, 47474 port (just picked one) and g722 codec (good balance between quality and bandwidth, it just uses 0,09 Mbps)

On server side (MM running on Debian cloud-hosted VM), at first, I would like to only rely on PulseAudio default RTP capabilities.
Unfortunately, I still couldn’t get my PA to receive the RTP stream from RtpMic (though I’m quite sure it is possible, but I keep getting “Unsupported SAP” error) so ATM I workaround this using ffplay:

sudo apt install ffmpeg
ffplay rtp://[local IP]:47474 -acodec g722 -nodisp

(note that local MM IP goes here, not client IP)
You can check the stream flowing in pavucontrol.

Now that RTP stream is correctly flowing, I configure ALSA layer to use Pulseaudio by default (this is needed because GA module relays on ALSA, and Alexa module relays on Sox). This is achieved by creating the file “.asoundrc” under MM user home, and writing the following configuration in it:

pcm.!default pulse
ctl.!default pulse

As a last step, I install and configure GA like that:
micConfig: { // put there configuration generated by auto-installer
recorder: “arecord”,
},
When started, GA module (and virtually any other module) is able to receive voice commands from remote client.

Please note that this solution only cover one way (MM input), from client (running browser) to server (running MM in serveronly mode), the other way (MM output) is already covered by MM playing its audio output inside the browser by default.
This way you can have MM running on any machine in any remote place, and have your mirror displayed by any browser capable client (even an Android one).
Also, I had to use RtpMic/ffplay because my client is Android (so no PulseAudio), but I’m quite confident the same schema can be applied natively on any other PulseAudio capable device.