Alexa Learns ASL, Making the Device More Inclusive

In 2018, Amazon sold 13 million smart speakers, followed closely by Google, which sold 11.5 million. The total number of digital assistants in use is expected to triple by 2023, increasing to 8 billion worldwide — an amount larger than the current world population.

However, most of these speakers are unusable for 466 million people worldwide and one million Americans in the deaf and hard of hearing community because they are voice interactive. Abhishek Singh noticed this trend and problem and, despite not having any deaf or hard of hearing relatives or friends, found it to be an “itch” he just had to scratch. So, he developed a program which would allow deaf and hard of hearing people to interact with Alexa, the Amazon voice assistant.

The Developer

Abhishek Singh is a software developer and founder of Peeqo, a desktop robot that responds with videos and gifs. He is also a freelance developer for AR/VR projects and is best known for building Super Mario Bros. in augmented reality.

How It Works

Using a laptop with a webcam and the web browser that Singh developed, a person would sign to the computer. The camera interprets the signs and converts them to text on the screen. The text is played out loud to Alexa and her response is then transcribed and displayed on the screen for the user to read.

Singh uploaded a YouTube video outlining this process and then published an article on how he developed the program. The article also outlined how you can teach the system your own set of signs, as the ones he programmed were pretty simple.

The Development Process

The article goes in depth on how he developed the program, outlining early research and every step he went through to achieve the final product. Ultimately, he used Tensorflow.js and Teachable Machine to develop his prototype.

The first step is to train the system in ASL. This means using the webcam to capture you performing the sign multiple times. After you train the system, you enter predict mode. This solidifies the signs and labels that you just trained the system. Then, Web API is used to speak the sign out loud. Singh created a sign for Alexa so he wouldn’t have to spell it out every time he had a query. When Alexa is signed, the Amazon Echo is activated and will await the query. After the query is signed, Web API is again used to transcribe the digital assistant’s response onto the screen.

This program is just a proof-of-concept at this point, understanding only the simple signs that Singh trained it with. However, he did upload and provide access to the code and a live demo, so others can go into the program and, combined with his instructions, train it with their own signs.

Bumps in the Road

Although the system works relatively well, there were some issues Singh had to account for. Since the program tracks movement, he had to program “idle states” as well as specific signs. These “idle states” were things like standing still or small movements of the arms and body. This way, the system won’t pick up on these small movements and detect erroneous signs. In another attempt to reduce incorrect predictions, he lowered the prediction speed, which “control[s] the amount of predictions per second.”

He also had to ensure that no sign was detected unless the trigger sign for Alexa was used first, and he had to program the system to detect when a query was done. He used two methods for this: training “termination” words, which would usually signify the end of a statement, or to have a specific stop word that signifies the end of the query. The stop sign he chose was also Alexa, using the sign to bookend the query.

Hope for the Future

While this solution is not optimal, Singh hopes that it sparks more research into the problem, ultimately leading to a more streamlined solution, such as a similar system being built into the speakers.

Already there has been some promise since Singh released his project. Amazon added a feature to the Echo Show for those who might find it hard to speak. This feature, Tap to Alexa, works similar to a tablet. When you turn the feature on, it allows access to icons of typical questions one might ask Alexa. Alexa then responds out loud and text displays on the screen. However, upon release, Tap to Alexa was only available to Americans, with the promise of a later release solely to other English speaking countries.

In addition to companies and businesses such as Amazon producing these types of products, there are organizations across the country who do similar research and development into Assistive Technology (AT) and Augmentative and Alternative Communication (AAC). One such organization is the International Society for Augmentative and Alternative Communication (ISAAC), which “is a membership organization working to improve the lives of children and adults with complex communication needs.” They work to create awareness about AAC and how it can help those who cannot speak by “sharing information and promoting innovative approaches to research, technology and literacy through AAC.” They host a biennial conference, sponsor projects and provide awards and scholarships.

Hopefully, as these types of programs continue to be developed and research into AT and AAC delves deeper, we will begin to see more inclusion for the deaf and hard of hearing community. Not just within America and not just using ASL, but worldwide, including other sign languages as well.