Some of you may remember in my original review of the AIY voice kit, that I mentioned the fact that the ladybox needed to be activated by either a button press or by clapping. I commented at the time that someone was soon going to come up with a voice or hotword activation trigger fairly soon.
Well since then I came across a video on youtube by Sid’s E-classroom which does just this using Snowboy. However this wasn’t ideal since it required installing pyaudio and sox so uses a separate recording system from the AIY voice kit. And you may remember that sox (though brilliant in a lot of ways) is very difficult to get working satisfactorily. Also due to the rather complex code of the kit it seemed a bit convoluted to set up.
Since then version two of the AIY voice kit api has completely rewritten the original code. Buried in one of the python files (I forget which) there is a hotword option. But this works on the basis that the hotword along with the everything else is sent to google to be transcribed. If the hotword isn’t found then the transcript isn’t returned. Ok – but this means all your private conversations get sent to google – do you really want that. Snowboy on the other hand works on simple one or two hotwords which are recognized locally and don’t go to the internet.
So I’ve taken the Snowboy approach with version 2 of the api.
The task was to get Snowboy working on the raspberry pi. The site gives the option to compile with swig. Now I don’t know about you but I’ve never heard of it before. But downloaded and tried to compile the code from github and it turns out you need version 3.0.10 or higher. Unfortunately the latest version of swig for the pi is below that. Next option download the precompiled version which can be found on the same page. But guess what this is set up to work with Python 2 not 3 as is used by the AIY voice kit. Luckily the solution I found was on the forum where one of the Snowboy authors posted a compiled version for Raspberry Pi – Python3 in response to someone’s request. Strangely a few days later the files were removed. I’m requesting the author on this at the moment, but in the meantime I will make the file available on my site. But bare this in mind.
After these initial hurdles everything run smoothly. I cut down the Snowboy python module to one function as it had many references to pyaudio which aren’t needed for use with the aiy voice kit. To integrate it with the google supplied software what I did was create a processor type object which is passed to the recording object and has callback functionality which goes into a queue. This is then passed in a loop to Snowboy. Snowboy returns a different status if voice is detected or the hotword is detected.
Creating a hotword is fairly simple. You need to open an account with Snowboy and send three sample wavs. You can then download a pmdl file which Snowboy will use to detect the hotword. It’s as simple as that. I have created a small python program which will record the wavs for you, send the files and retrieve the file all in one go.
There are three ways to use the functionality. Initially the ladybox waits for the hotword. She can then wait for the next human interaction. As soon as voice is heard this get’s sent to google. Thirdly there can be timed voice recognition. She waits for a certain amount of time for voice before going back to sleep (or waiting for the hotword again). I also include a context list which keeps track of the conversation context.
Adding this functionality has made the kit more naturally conversationally. I hope this will be useful for some of you. Let me know if you get it working or have any issues.
Code is available at bitbucket.