Giving Voice to Commerce

If the major technology companies are right, the future will belong to ambient computing. And part of that push may be a new, and rather talkative, vision for retail.

The concept, which envisions people interacting with technology without chaining themselves to a phone or computer screen, once sounded far-fetched. But it’s already becoming material through devices like smart speakers and voice-equipped smart displays from Google, Amazon and Apple.

More from WWD

It seems like a short hop, at least on the surface, between telling Alexa or Google Assistant to turn off the lights and asking if the fall collection has arrived at a local Gucci store. Or asking an in-store bot where to find a jacket at Neiman Marcus and hearing about accessories that would go with it.

Think of it as ambient shopping. And Big Tech is already weighing how it can work.

“Really, the essence there is that speech is such an efficient interaction tool,” Steve Gurney, worldwide head of general merchandise for Amazon Web Services, told WWD. “It is the most efficient interaction tool, for when those assistants start being able to understand the context of where you are, what point you are in your retail inspiration, previous buying history and trends, etc.”

That potential has earned voice tech a spot in Gurney’s upcoming session at WWD’s virtual Tech Forum on Thursday, where he will map out some of retail’s top tech priorities.

Amazon, the market leader for smart speakers and displays, clocks billions of voice interactions a week from Alexa customers, more than half of whom use their Echoes to shop. The company is informed by its vast e-commerce efforts and its multitude of devices, from its previous Echo Look fashion selfie camera to the latest family of devices.

As voice commerce stands today, it’s most relevant for categories like groceries. That’s only logical, as people often use Alexa or Google Assistant devices to do things, like set cooking timers. But according to AWS, this footprint is expanding. Echo devices have landed in hotels and even surprising places, like gas stations.

An “Ask Alexa” terminal at Amazon Fresh - Credit: Courtesy photo
An “Ask Alexa” terminal at Amazon Fresh - Credit: Courtesy photo

Courtesy photo

According to Gurney, voice tech for retail became even more interesting during the pandemic, as it offered another route for contactless payments. But he also offers a caveat: Shopping for fashion by voice will likely take years to become a reality, at least in a meaningful way. But even so, it’s clear that the vision for it is already here.

“An example may be a teenage girl walking into an H&M and, on Apple AirPods, just asking Alexa, ‘What’s new at H&M this week?’” he continued. The assistant could cover some of the latest arrivals and populate the customer’s phone screen with the visuals. If shoppers have a specific query, they could ask, “I’m interested in summer dresses, show me where they are in the store,” Gurney added, “and then using augmented reality, it could show directions of where to go.”

These types of scenarios necessarily invoke smartphone screens. Certain uses demand it, like directions, as do visually oriented product categories like apparel and cosmetics. That’s where it diverges a bit from pure ambient computing. After all, it’s hard to impress consumers with a stunning, artistically designed ensemble if they can’t see it.

The visuals are a crucial aspect of the experience, according to Carolina Milanesi, principal consumer tech analyst with market intelligence and strategy consulting firm Creative Strategies. Shopping for basics are one thing, but “if you’re thinking about makeup, or if you’re thinking about clothing, things get a bit more complicated,” she told WWD. “There’s color, different kinds of red, right?”

As far as the voice tech itself is concerned, Milanesi sees the experience as far from perfect, requiring proper phrasing or certain words to be understood, which would be major setbacks for any business applications. She believes that it needs to get much better to become useful, especially in complex shopping scenarios — although she did acknowledge that natural language processing has improved over time, perhaps offering some optimism for the tech’s trajectory.

In fact, she cited a new feature that Amazon is developing. Last week, at its re:MARS event in Las Vegas, the company showcased how it can replicate a deceased loved one’s voice. The idea is to allow grieving users to get closure, as well as other scenarios, like letting a dearly departed grandma still read a bedtime story for the kids.

“While AI can’t eliminate that pain of loss, it can definitely make their memories last,” Rohit Prasad, senior vice president and head scientist for Alexa, explained at the event. But some critics found the feature creepy.

That’s a concrete risk for any innovation that skirts the line between tech and humanity. Most people want their tools to be seamless, easy and natural to use. But if it goes too far, users may find it unsettling — as makers of some rather lifelike robots can attest. That’s certainly true of voice as well, as it resonates as a uniquely human form of expression. Its ability to duplicate real voices may also bring new privacy and security concerns, particularly for any system that relies on vocal biometrics.

For these reasons and more, it makes sense that the slow walk of tech development and deployments — in the home, car, hotel and tentatively in retail — looks like more of a careful tiptoe than a sprint.

On the consumer side, analysts wonder if voice-enabled smart speakers have already peaked. Several studies portray traction as slowing down across the market. According to Insider Intelligence data, the adult user base grew by 11.8 percent in 2020, but only increased by 2.9 percent in 2021. It anticipates more erosion this year, with growth of 2.6 percent.

New reports claim that Meta is planning to shelve its Facebook Portal voice-enabled smart display. The revelation follows an earlier leak of documents, reportedly from Amazon, that framed Echo products as having already passed their “growth phase.”

Amazon publicly pushed back at the notion, defending both its voice assistant and its smart speakers. The distinction is important — because people don’t have to own an Echo to use Alexa. It also lives in iPhone and Android smartphone apps.

This year, 42.7 percent of adults are projected to use a smartphone to engage with voice assistants every month. Of those who use conversational AI, the vast majority — at 91 percent — are using them on their phones.

Either way, Amazon is eyeing Alexa growth, especially for shopping.

“Over half of Alexa customers are regularly using shopping features, and usage is growing quickly. I think the future of shopping will keep building on the idea of convenience, discovery and helping a broad set of customers,” Rajiv Mehta, general manager of Alexa Shopping, told WWD in a statement.

“Voice is the most natural and intuitive interface, it simplifies the shopping experience by offering customers more ways to discover products, get inspiration, organize their shopping needs, get answers about any product and making it convenient to buy what they want, when they want,” he added.

The smartphone proposition matters, because the vocal interactions can then have those valuable visuals ready at hand. Both Gurney and Milanesi alluded to the value of that, especially when combined with other tech, like augmented reality to navigate store aisles or check out real or digital goods in 3D.

“So there you see the ability of ambient systems blending with AR or the metaverse, and actually delivering a real, improved customer experience,” the AWS executive said. “You could really see them having a play, and that could be wedded into a metaverse interaction or an audio conversation through ear buds.”

Metaverse and virtual reality experts, developers and executives believe audio experiences will be crucial for virtual worlds, as spatial audio brings an element of immersiveness and realism. And given how finicky and difficult merely walking around in those 3D environments can be, offering verbal commands one day would be very welcome — not just to newcomers, but also for users with disabilities.

Shopping for NFTs could be as simple as saying, “Take me/teleport me to the Ralph Lauren store.”

In that way, voice commerce and voice-enabled shopping in the virtual or augmented world might even offer more opportunity, at least in the near term, than the real one.

Sign up for WWD's Newsletter. For the latest news, follow us on Twitter, Facebook, and Instagram.

Click here to read the full article.