Developing an Open Source Voice Assistant: Interview with Mycroft AI’s Steve Penrod

A look at Mycroft, a machine-learning AI home voice assistant—that's open source.

News February 09, 2018 by Kate Smith

Mycroft is a machine-learning home voice assistant that claims the title of "the world's first open source assistant"—and aims to give Amazon Echo and Google Home a run for their money. AAC talked to Steve Penrod, CTO of Mycroft, about security, collaboration, and what being open source means for both.

Mycroft is an industry first. Where Amazon Echo and Google Home are unsurprisingly closed-lipped about their data gathering, we know that recordings gathered from these devices are stored for later use (whatever that might be). Mycroft Mark II, by comparison, is an open source voice platform.

This means that users of the Mycroft platform can opt into sharing their usage data and designers can then use that data to learn more about demographics, language, and voice recognition.

On the other hand, users could choose to keep their data private.

What we know about Mycroft Mark II's hardware is that it has a Xilinx quad-core processor, specifically a Zynq UltraScale+ EG MPSoC. It has an array of six far-field PDM-based MEMs microphones and has hardware acoustic echo cancellation (AEC) for beamforming and noise reduction. It has stereo sound with dual 2" drivers (10 Watts), a 4" IPS LCD touchscreen, BT 2.1+EDR and BLE 4.2 Bluetooth In, and single-band Wi-Fi (2.4 GHz).

Mycroft's Mark II. All images used courtesy of Mycroft AI, Inc.

The Mark II is Mycroft's most consumer-friendly form (recently 100% funded on Kickstarter). But the Mycroft platform, itself, represents a change in how designers can interact with a voice assistant platform. Designers who want to work with Mycroft directly can forego the Mark II and download Mycroft for a desktop (Linux), Android Designer, or a Raspberry Pi 3 (called the "Picroft"). It's also available as a reference hardware platform in the Mark 1.

AAC corresponded with Mycroft's CTO, Steve Penrod, about their collaboration with Mozilla on speech recognition, the future of smart homes, and the concept of open source as a way to increase security.

All About Circuits (AAC): Going up against Amazon and Google is ambitious, to say the least. What do you see as Mycroft's most important unique value?

Steve Penrod (SP): Collaboration. Sometimes this will be groups of individuals. Sometimes it will be organizations like Mozilla. Sometimes it will be more traditional corporations who recognize the value of working together.

AAC: Speaking of collaboration, Mycroft worked with Mozilla on their open source speech recognition model, DeepSpeech. Can you share a bit about that?

SP: Our work with Mozilla is ongoing. They obviously have orders of magnitude greater resources available and have developed world-class machine-learning-based technology.

Mycroft brings three things to the table:

An application for their technology
A strong and excited developer and user community
A source of real-world, "unclean" voice recordings that gets bigger every day from users who have agreed to share their data for the greater good.

In the machine-learning world, the most precious thing is the right data to learn from. And lots of it.

In the machine-learning world, the most precious thing is the right data to learn from. And lots of it.

AAC: How did the collaboration with Mozilla come about in the first place?

ST: We had been tracking each other's progress from afar for nearly a year. When Mycroft finally reached out to make contact we each understood the potential value of a collaboration. Creating a partnership seemed a natural.

AAC: What were the biggest challenges in creating your own intelligent assistant stack?

ST: Mainly organizational. Finding a crowd full of great potential is easy. Bringing some order to that crowd so we are working together is the tricky part.

AAC: What led you to make Mycroft open source?

ST: When the idea for this first hit me, the world had yet to see a "voice assistant". It was a wide open space at that time and, like many technologies that are primed to emerge, there were several efforts around the world to bring it to life.

Amazon's Echo device announcement was a surprise to all and did us a service by introducing the concept to the public. But the resources they had made it impossible for a small company to compete. Google was able to. And Mycroft was able to because we understood the strength of open source is collaboration. We are stronger together.

AAC: It sounds like you're just as interested in providing open source data about Mycroft's usage (on an opt-in basis) as you are in allowing customization of the platform, itself. Can you give us an example of how designers could use this open data to make better devices down the road?

SP: As I mentioned, in the machine learning world, the real value is in the data the system is given to learn from. Google has realized this and was willing to release TensorFlow, the software they used to train their systems. But they have notably not released the datasets they train on. There are many reasons for this, including respect for privacy and legal requirements to protect data.

Much like raising a child, the data you feed into the system shapes the output of the system. So careful curation can produce systems with opinions and personality. I personally don't believe everyone wants a single universal assistant. They want their assistant. Joining together datasets will allow users to shape this assistant—unique voices, unique attitudes, unique interests.

AAC: What do you see as the future of AI in products like Mycroft?

SP: They will become so pervasive that you won't even notice them. They'll become part of the fabric of life.

[AI products like Mycroft] will become so pervasive that you won’t even notice them. They’ll become part of the fabric of life.

AAC: What do you think is the biggest misconception people have about AI?

SP: That it is right. AI only knows what it has been trained on.

AAC: Do you think people are embracing the concept of the smart home?

SP: "Smart" has been ready for some time, but it simply hasn't been easy to control. Voice controls will make it easy for users to interact with the devices. Local intelligence will reassure people, allowing combinations of data to happen that were possible but not palatable in a cloud-controlled system.

For example, understanding my calendar and my current location at all times is a great way to predict when to raise the temperature of my house, but I don't want to grant access to that much info to Amazon or Google.

AAC: At CES 2018, we saw some companies aiming to add IoT capabilities to existing devices as an alternative to creating entirely new IoT devices. What's your reaction to this possibility?

SP: I live in a 120-year-old house. It has knob and tube wiring, push-button light switches and the original fixtures. Yet it can recognize me arriving and leaving without a word being said, turn on lights when I walk in the room and turn them off when I leave the house. I love the idea of additive technologies over replacements.

AAC: You refer to Mycroft as being "privacy focused". Can you elaborate on that?

SP: The architecture of Mycroft, at its core, puts the computation as close to the user as possible for the situation. For example, two years ago, the idea of running high-quality speech-to-text in your cars was just a dream—as anyone with a 2011 Mycroft Sync system understands. But, today, putting a several GPUs in a high-end car is more than feasible.

With hardware like this, technology like DeepSpeech can be run on your own hardware. Your voice can remain under your control. That is the ultimate privacy.

AAC: How have you guys handled the issue of security?

SP: Peer review is the best way to tackle security. Allow the weaknesses to be examined, then work together to fix them.

Peer review is the best way to tackle security. Allow the weaknesses to be examined, then work together to fix them.

AAC: On the development end, did you come across any specific difficulties with sourcing parts?

SP: Of course, and I don't want to jinx ourselves by saying it is over! But this round we have a better idea of the difficulties we will be facing than we did with our first Kickstarter—we were pretty naive then.

AAC: What's the most interesting thing you learned about language while working with this device?

SP: Written Thai has no spaces between words. Characters appear in a continuous line and the reader just knows when one word ends and the next begins.

Thank you, Steve, for your time and insights!

If you'd like to learn more about the Mycroft Mark II, you can check out Mark II's Kickstarter page (which was fully funded in six and a half hours and, as of February 20th, has been fully funded six times over).

Check out their website for more info and don't forget to tell us about any Mycroft-based projects you build in the comments below.

Learn More About

amazon machine learning google ai open source deep learning voice assist amazon alexa google home

mrmonteith February 16, 2018

With the Maker movement there will be an interest on how to basically build your own automation etc around this. Especially with some of the new IoT devices.

Like. Reply