Hannes Heikinheimo
Sep 19, 2023
1 min read
Speechly is a tool for enhancing touch user interfaces with a voice modality. In addition to touching and clicking, the end user can use the most natural way for interacting with the application – voice.
This blog post is about why you should use Speechly React client for building your next multi modal user experience. If you already knew it, you can jump directly to our Get started with React Client tutorial and start developing.
So far developers and designers have been limited in the services they can use for voice user interfaces and only simple question-answer based interactions have been feasible. On the other hand, while touch screens are great, there are a lot of use cases that could be improved. Something as common as filling a form or changing search filters requires quite a lot of tapping and swiping.
We don't believe in smart speakers and voice assistants and have worked hard to improve both the current touch screen experience and the current voice experience with our technology.
React is a great library for building user interfaces and it makes building UI components fast and easy. However, while scrolling and tapping is fun and even addictive, there are tasks that could be completed a lot easier by voice. Some common use tasks that could use a better UX include:
A typical form has a few text input fields, possibly with autocomplete, one or two dropdowns, maybe a multiselect and a few radio buttons and a date picker. To fill them all, the user needs to tap to select the input field, type something, select something and move to the next field. And because there are multiple ways how each of these fields can be implemented, some amount of confusion is a given.
Search is a typical feature in almost all applications. A good search can double the conversion rate on an ecommerce site. But the more filters you add, the more cluttered the UI gets and it's harder for the user to find the filters they are looking for.
As a touch interface requires the user to touch something, the buttons they need to touch have to be on their view. This is a familiar issue for all designers, as they need to think of ways how the user can interact with all the great features they have added to their application. However, there's only so much screen real estate available, so the buttons need to either be very small or there needs to be some kind of a nested menu, where not-so-often used buttons are hidden.
Current voice solutions are built for turn-based voice assistant experiences. The end users' speech input is processed after they stop speaking and answer is usually given by speech output. This works well for some use cases, but it can't be used for enhancing current applications. Speechly is built from ground up for multimodal touch screen experience.
The user says something like "Show me flights from New York to London, departing tomorrow", the machine waits for a while and shows flights from New Jersey to Longmore. With our React client, the user interface updates in real time and it's easy to see whether the system makes any mistakes and fix them either by using voice or by using touch.
If you have ever used a voice assistant, you'll know that a lot of the queries start by "Hey provider, turn on... Hey provider, turn on the... HEY PROVIDER, TURN ON THE LIGHTS". And once you stop speaking, it takes a random time for the lights to really turn off. With Speechly, the feedback is instantenous and it's always clear when the service is listening.
While most of us speak faster than we can write (especially on touch screen device), we read faster than listen. And if you don't hear a small detail in the middle of the sentence, it's hard to go back. For most tasks, it's better to see the result rather than hear it. That's why Speechly has been built to be multi modal from the ground up, meaning it supports all modalities: touch, vision and voice simultaneously.
We've released a React client that helps developers and designers solve these issues when building React apps. You can find the source code on GitHub and the package on NPM. We've also published a short tutorial to get your started with it, so go ahead and check it out!
When building the client, we've tried to make it easy to use with modern React concepts like Context and Hooks, which should make it easy to integrate to your React app. But if you're not interested in using functional components and Hooks, it wouldn't be more difficult - you can still use regular Context consumer approach!
Developing on Speechly requires you to first create your voice UI configuration by using our web Dashboard or our command line tool. The configuration is done by providing example utterances that your end users are using to interact with the application. After you have configured the application, you can try it out in our Playground and finally integrate it with your application.
Once you have verified that you get the correct intents and entities for your utterances, include the React client to your application. You can use our React tutorial to get started.
Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.
Hannes Heikinheimo
Sep 19, 2023
1 min read
Voice chat has become an expected feature in virtual reality (VR) experiences. However, there are important factors to consider when picking the best solution to power your experience. This post will compare the pros and cons of the 4 leading VR voice chat solutions to help you make the best selection possible for your game or social experience.
Matt Durgavich
Jul 06, 2023
5 min read
Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.
Markus Lång
Jun 01, 2023
1 min read