Hannes Heikinheimo
Sep 19, 2023
1 min read
Voice user interfaces allow users to interact with a computer system or application by using voice and speech commands. Voice user interfaces make use of speech recognition and natural language understanding technologies.
The obvious advantage of a voice user interface is that it allows a hands-free, distract-free way to use an application while still focusing most of their attention on another task. However, that's not the only or even the main advantage of a well-designed voice user interface.
The main advantages of voice UIs include:
According to a Standford study, speaking is at least four times faster than typing on a touch screen device. This makes voice a great input method for information heavy tasks, such as filling complex forms and searching from a large inventory of items.
Even after using dozens of different email clients, finding certain rarely used features such as vacation responder or signature will be somewhat difficult on a new system. The user knows that the feature is somewhere, but it's impossible to know where it is before browsing through many different menus and options.
Voice, on the other hand is very different. The user can just say something like "change my signature" and they'll find the setting they are looking for immediately. Many cars already benefit from this kind of voice features and over a third of US driver's license holders use these features monthly.
Voice user interfaces can support many ways of expressing the same thing. Let's get back to the vacation responder example mentioned before. The user might call the feature either Out of Office -message or a Vacation responder. If they are looking for a "Vacation responder" from the menus, they might miss the "Out of Office -responder" item even if they saw it.
The designer has to decide the name for the feature and stick with it. Some users will think it's the most natural name for that but some would prefer the other name. This is not the case with voice UIs.
A voice user interface can support dozens of synonyms and ways of expressing the same thing. No matter how the user expresses their wish, the user interface will react accordingly.
Voice UIs enable the user to focus their attention on another task. This is especially useful when driving a car or a forklift as it improves safety and productivity.
It can also help users multitask inside an application. For example in gaming, players can change a camera or switch weapons without navigating in deep menus.
While accessibility is essential for those suffering from various impairments, it is beneficial for all of us. Groups of people who can depend on voice features include people with disabilities that make the use of keyboard and mouse impossible, people with chronic conditions such as Repetitive Stress Injuries, who want to limit their use of keyboard and mouse, and people with cognitive disabilities.
Some examples of well-designed voice user interfaces include our fashion eCommerce demo and
Voice user interfaces, especially when not implemented correctly, have some disadvantages, too. These disadvantages do not prevent the use of voice UIs, but they are something that a product team should be aware of.
People might not be willing to speak in public spaces because they are either being considerate towards others or due to privacy reasons.
Privacy might be an issue also because of the news regarding major tech companies and smart speakers. While this is not an issue with voice UIs as such, it is something that should be taken into account by being as open as possible with how the user data is being handled.
Some users may not like talking to a computer or just prefer texting. These preferences can be static or context-dependent. For example, a user might prefer texting over voice when searching for health-related information but prefer voice when searching for hotels.
While voice can be the fastest and most suitable interaction modality for many user tasks, it's not a silver bullet for all user tasks. Selecting an item from a list of a few is probably easiest by using touch and drawing is most certainly easier with a mouse or touch. Voice on the other hand, is especially great for selecting from a large inventory of items and inputting information-heavy data such as most forms.
One important goal of a product owner or a designer is to make the product or application as easy and intuitive to use as possible by leveraging all relevant technologies and design methodologies.
Adding voice as one tool in the toolbox can yield good results. Voice should not be added to the product because it can be done but rather, because it is the best way to solve certain user tasks.
Just like voice-only is rarely the best way to approach a design problem, most often it's not GUI-only either. A touch screen with a voice modality is a great combination for creating efficient and easy-to-use user interfaces.
Most applications can leverage voice modality in some features. Most applications will also need a screen. GUI and VUI should not be seen as alternatives, but rather as enhancements that can improve each other.
One of the biggest problems with smart speakers is the lack of a touch screen. That's why selecting an item on a smart speaker is very cumbersome.
GUIs on the other hand have some other deficiencies. As the screen real estate is limited, new features are either hidden behind nested menus or the UI gets cluttered with buttons. And to put it bluntly, GUIs are not human-compatible. Even if we have kind of gotten used to them, there's nothing intuitive or easy in many common GUI design patterns such as hamburger menus or double-clicking.
We at Speechly are proponents of efficient user interfaces. We think that a user interface should be designed to be powerful tools that help users achieve their goals quickly. This is especially important with applications that are used often – and most product owners know that retaining users are the holy grail of any successful application.
If the user knows what they want to achieve, they can most probably say it out loud faster than they can browse through menus and click buttons. Especially so if what they are trying to achieve is information-heavy. Think of something like purchasing weekly groceries: searching and selecting repeatedly is slow compared to just saying out loud all the items you want to add to your shopping cart.
Human brains process information in two distinct systems: visuo-spatial system that's in charge of visual and spatial information and a linguistic system that takes care of speech information.
Because these systems are different, it's rather easy to drive a car and speak at the same time. When doing it, we simply employ both of these systems.
However, it's not possible to do two things at the same time in either of these systems. This is why it's not smart to drive a car and text simultaneously and a discussion where two people are talking simultaneously is next to impossible to follow.
A graphical user interface without voice features or a voice-only user interface such as a smart speaker is limited by this. If a user asks something from a smart speaker, they'll have to wait patiently until the smart speaker has finished answering. This is especially cumbersome if the answer is lengthy. If they could ask something and see the answer on their screen, they could immediately start refining the question.
Examples on how multi-modal voice user interfaces leveraging the best parts of traditional graphical user interfaces and voice features can be seen below.
Voice search in eCommerce
Voice forms
If you want to try out the fashion demo yourself, you can access the demo here
Voice can improve user experience and make human-computer interaction more efficient. However, it should not be thought of as an alternative to current graphical user interfaces, but rather as an enhancement for those.
Combining the best part of graphical user interfaces and voice user interfaces enables efficient, intuitive, and easy-to-use user interfaces while not sacrificing anything from the current user interface.
By using real-time Spoken Language Understanding API such as Speechly, designers can enhance their current user interfaces with voice functionalities. Speechly can be applied to any industry or domain and with our design guidelines and developer support, teams can build awesome user experiences in a short time.
If you are interested in improving your current applications' user experience, leave your email address and our industry specialist will contact you as soon as possible.
Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.
Hannes Heikinheimo
Sep 19, 2023
1 min read
Voice chat has become an expected feature in virtual reality (VR) experiences. However, there are important factors to consider when picking the best solution to power your experience. This post will compare the pros and cons of the 4 leading VR voice chat solutions to help you make the best selection possible for your game or social experience.
Matt Durgavich
Jul 06, 2023
5 min read
Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.
Markus Lång
Jun 01, 2023
1 min read