Hannes Heikinheimo
Sep 19, 2023
1 min read
As Meta continues forward with their commitment to the growth of the metaverse, they’re also grappling with the reality that harassment in VR could turn mainstream consumers away. Their incoming CTO, Andrew Bosworth, referred to this as an “existential threat” to their plans for the metaverse expansion.
The threat is a very real one. Microsoft recently shuttered elements of their AltspaceVR public social hubs and made plans to increase moderation to ensure that the platform is safe. Voice chat has been used to sexually harass players using Oculus for gaming. The potential for harm in these new spaces is obvious and the need for effective moderation solutions is clear.
It’s important to note that this isn’t an issue that can be easily solved; Mike Masnick, founder of Techdirt, wrote about what he calls Masnick’s Impossibility Theorum. He argues that, “content moderation at scale is impossible to do well.” (It’s worth calling out that he still feels it’s something that needs to be done.)
What’s interesting about moderation in the metaverse is that you have multiple different modalities at play. People can talk to each other and they can interact with each other through simulated touching and gesturing. Moderation must be occurring across both modalities in order to be effective and solutions for both should be flexible enough to allow them to work together in parallel, to provide additional context and improve the quality of the moderation efforts.
When people talk to each other, they’re listening not just to the words being said but to the way that they’re said. They observe the body language of the speaker. They know the context of the relationship with the speaker. All of these things factor into the way that the words spoken are processed and understood by the listener. For moderation purposes the understanding of all of these things together is key, and it has to be done accurately and **quickly. **Why? Because a recent survey found that 60% of kids and 83% of adults have experienced harassment in online multiplayer games. That is a huge human impact and the online gaming voice experiences offer a lot of parallels to the metaverse experience but now with new, more interactive, ways to cause harm.
This potential for harm is something that all of the big players in building out elements of the metaverse are aware of. If their platform does not have technology in place to help identify, investigate, and intervene in situations like this, their platform becomes a tool that harms people. That’s not good for people and it’s not good for business.
This space is interesting to the team at Speechly because the challenge posed is one that our technology is uniquely positioned to help address. Ideally the technology would be deployed as a flexible chat moderation API with a custom model to suit each specific community and environment. The ability to simultaneously run automated speech recognition and natural language processing means that we’re able to help moderation systems respond faster, and with more accuracy.
If you’ve ever read a transcript of a conversation, you know that it can leave a lot to be desired. The ability to create these transcriptions in real time as people are speaking is at the heart of what is needed for successful voice chat moderation. Then you add in the layers that bring it to life and the context and understanding necessary to determine if something was said that should be escalated.
Building AI powered models around things like sentiment analysis, volume fluctuations, and tone can all be used to help understand the context of what was said. Remember that in the metaverse, unless someone is streaming and recording the experience, harassment that is spoken leaves no “evidence” left behind. There’s no comment to screenshot, no profile to click to better identify the harasser. The experiences often move quickly and the harasser can quickly move on without any intervention. Unless. Unless there’s an AI layer built in to help identify, intercept, and intervene in real time.
As companies continue their push into new forms of multimodal online experiences in the metaverse, the need for effective moderation will only grow. The types of harassment will shift and expand along with the capabilities of the metaverse and the technology to monitor and moderate it will need to expand alongside it.
The sooner that AI powered models are deployed, the smarter and more effective the technology will become, and the better everyone’s experiences will be.
Cover photo by Julia M Cameron on Pexels
Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.
Hannes Heikinheimo
Sep 19, 2023
1 min read
Voice chat has become an expected feature in virtual reality (VR) experiences. However, there are important factors to consider when picking the best solution to power your experience. This post will compare the pros and cons of the 4 leading VR voice chat solutions to help you make the best selection possible for your game or social experience.
Matt Durgavich
Jul 06, 2023
5 min read
Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.
Markus Lång
Jun 01, 2023
1 min read