Setting Up Retail Stores for Machine Learning – Cameras, Microphones, and More

Daniel Faggella

Daniel Faggella is Head of Research at Emerj. Called upon by the United Nations, World Bank, INTERPOL, and leading enterprises, Daniel is a globally sought-after expert on the competitive strategy implications of AI for business and government leaders.

Setting Up Retail Stores for Machine Learning – Cameras, Microphones, and More

Episode Summary: We speak this week with Aneesh Reddy, cofounder and CEO of Capillary Technologies, which focuses on machine vision applications in the retail environment.

How do we instrument a physical retail space so that, with cameras, we can pick up on the same kind of metrics that eCommerce stores can? Retail stores, as Reddy talks about in this episode, have to focus on the data that they get from the checkout counter, such as what kind of purchases were made, and potentially some kind of data about how many times the front door was opened or closed. That doesn’t really lay out that much detail about who came in, what percent of them converted, and what the average cart value was for different people.

A lot of that is completely greyed out when looking at the numbers that are accessible to brick and mortar retailers. But some of that is changing. Reddy talks about what’s possible now with machine vision in retail, and what it opens up in terms of possibility spaces for understanding customers better in a physical environment. More importantly, Aneesh paints a bit of a future vision of where he believes retail is going to be when not just computer vision is included, but when audio and other kinds of sensor information are included.

Subscribe to our AI in Industry Podcast with your favorite podcast service:

Guest: Aneesh Reddy, cofounder and CEOCapillary Technologies

Expertise: product development, software development, analytics, IT and security, machine learning

Brief Recognition: Reddy holds a Bachelor of Technology degree in Manufacturing Science and Engineering from the Indian Institute of Technology. Kharagpur. Previously, Reddy served as Assistant Manager at ITC Limited.

Interview Highlights

(3:20) How are metrics tracking techniques moving into the complicated physical world?

AR: So we started off on this journey like any other. We run a bunch of eCommerce sites on our platform: for Pizza Hut, for Walmart and others. The kind of data that they have on the online site, you know every click is tracked, every product search is tracked, which products are not available is tracked. There is so much info there, and if the offline retailers could get this information, they would just run their business a lot better.

So today the average customer spends about ten or fifteen minutes in an offline store, and all the retailer has is one digital transaction at the end of it. None of those interactions are captured, or you know which happens in an online store where every click, the clickstream, and the full story is all available. So what we’ve done is we’ve tried to apply speech and vision, basically putting a bunch of cameras and putting in a bunch of microphones into an offline store.

Let me cover what’s already possible and what it’s like in about a thousand stores now. What we’ve been able to do is get to an extremely accurate number on what the conversion in the offline store is, which is getting an accurate number on how many people walked in, removing the store’s staff, removing the other people who come into the store to do services and stuff like that, and give the leader a very clear number, an almost 97% or 98% accurate number of what is the actual conversation rate for the store.

So we use image processing to do this. It’s a camera at the entrance facing down and counting the number of heads that walk in, and then it estimates the number of the store’s staff on the basis of uniform or something else like that. So it basically takes care of getting you to that 97, almost 98% accuracy there.

What we’ve also been able to do now is have the camera look at you, but we do not capture any personally identifiable information. What it does is get a quick grasp on what your demographics are so it would also give you a view of say, folks who are between 20 and 30 and walked in wearing jeans are converting at 70%. While folks who are between 40 and 50 and walked in wearing formal wear are converting at 7%. What it gives you as a retailer is a very good idea of in which store, what kind of audience is converting?

In the offline world convergence is a big deal. Unfortunately, that information is not available for retailers. So this piece is what we’ve been able to solve pretty well. And I think use cases to this are when you run a massive marketing campaign, you can very clearly say if you targeted the right kind of audience based on what kind of performance and what kind of demographics performed well, what was proposed, and what in the proposal and demographics lined up.

We can also give the retailer a good idea of what are the in-colors right now? What are the in things in terms of fashion, that customers are wearing when they walk into the stores. Which is again through the demographic visual profile that we’re able to generate through these. These two things are already live with more than a thousand stores now across India, Indonesia, the middle east, where we’re seeing almost 97 98% accuracy on some of these numbers.

(8:03) How is visually understanding preference for products informing machine vision?

AR: So what we do is we have a camera placed centrally on the roof of the store, and we can basically say that this customer walked into these aisles. And think of a fashion store like GAP or US Polo. Let’s say a customer walked into one of these aisles and picked up a blue shirt. So we can get to that level of detail, the level of detail of saying, “this was the specific SKU that the individual picked up.” We are not tagging it to an individual so it is not personally identifiable. And then we’re seeing the accurate data that’s saying, “20 to 25-year-olds wearing jeans are actually picking these kinds of products.”

It’s useful information the retailer can act on. We are definitely trying to get to that level of detail. It is a lot on colors and categories, it’s more like broadly jeans, shirt, jacket, tops, dresses. It’s largely those it will really need a lot more. And some of these models we’ve had to build from the ground up. Over the years we can probably get to more levels of clothing like the color of the watch or stuff like that. But really we try to keep it with high-level categories.

(11:29) How might microphones be used in an automated way to help instrument the physical world?

AR: Again, if you look at an online store—let’s say you’re running an apparel store—you can very easily track how many people search for a black shirt. In an offline store, getting that kind of data is extremely hard. So what we are trying to do is, every five feet or so, put a microphone into an offline store, and digitize those interactions that are happening between the customer and the store staff. Let’s say a customer walked up to store staff and asked for a black shirt; how many times do people ask for a black shirt? How many times did the customer say, “are there offers in the store?” How many times did the customer say “This size doesn’t fit me?”

Some of those interactions are what we want to, again in a non-personally identifiable manner, aggregate more data the retailer can work on. So this one is a much more complex problem than the original one because there’s a lot of cross-talk if you have multiple customers in the store or if you have multiple store staff in the store. That’s a problem that is currently in our labs, and I must say we will take probably another three months to fully solve it.

The other problem that happens is, unlike in the US, if you look in India or any of the Asian markets, there’s also multiple languages at play. Some customers speak English, some customers speak Hindi, so it takes the level of complexity up that many more times. And use cases that we’ve been able to figure have been things like saying which product are customers not finding in which types of stores.

Retailers you typically have a very strong standard operating procedure in which you want the staff to engage with the customer. You know, the customer walks in and you want them to be greeted, ask them which products they want, help them around with sizes; there’s a cycle that different brands make for themselves. Unfortunately, in all the mystery shopping stuff that we’ve seen, less than 20% or 25% of the time the SOP is being followed. That’s because there’s so many customers, so much staff churn, and you don’t know which staff is to be trained and which staff is not to be trained.

Again with the speech type stuff, you will be able to identify that these two folks need to be trained. It’s almost like a personal trainer for them. It’s saying, “look great, you did well today, these are the two things you can do better for tomorrow.” Some of those use cases are what we’re trying to address with the whole speech thing in a three-month time frame. Like I said we run a lot of eCommerce sites for Pizza Hut and others, and what we’ve realized is that you’re able to personalize the experience with eCommerce sites.

Your promotions go up about 20-25%. Unfortunately, in the offline world it’s just impossible to personalize these kinds of interactions. So in the six-month time frame, what we want to try and do is have a small bluetooth headset for the store staff, where we quickly look at who is walking in the door and say, “these are the three products that you should pitch to that customer.” It’s kind of making personalization in the offline store. With the engine in the background for the collaborative training model, we can say “people walking in looking like this and wearing this kind of stuff typically tend to buy this kind of product.” And so we pass that info very quickly to the store staff and they can act on it and hopefully give a better experience to the customer. And hence lead to better conversions in the store.

(16:18) Where do you see the combination of vision and audio in the near future?

AR: If you’re actually looking at offline store staff, unfortunately, retailers don’t invest in enough training because they know it’s a temp staff and they keep rotating. Hence solving that problem through continuous feedback I think is the right way to go.

It will start with some of these metrics like, “you interacted with 25 customers today,” or “how many this week?” Or just, “follow the SOP.” If you follow the SOP, chances are your conversions would be better than if you were doing something random. So it would definitely start with that, we want to do all of this stuff.

I don’t know if we would want to go all the way and link it to bonuses. I don’t think we would go that far. I think today we’re looking at it more as a personal trainer for the store staff than as an incentive measurement. That’s not really the intent right now. It’s also very easy to get opt-ins from the store staff. Unlike a consumer, where someone is coming in, getting them to sign up for opt-ins would be harder, n the store staff side, because it’s more of a personal trainer for them, it will be easier to get opt-ins.

So that should hopefully take care of privacy concerns with regards to the store staff training pieces. In terms of the consumer, I don’t think we are going to do anything which is personally identifiable. It’s going to be very broad demographics, it’s going to be very broad recommendations. But it’s never going to get to the point of saying “hey, this is Dan.”

(19:33) How will the core dynamic of retail change and what will executives need to understand as that change comes?

AR: If you look at most retailers, especially the large ones, their backgrounds. Eventually, it’s just traders setting up shops and selling products. So the focus on merchandising and putting the right product in the store has been so high for so many years that I think eCommerce (???) Unfortunately I still do see that behavior a lot with a lot of the senior management where they still believe that stocking the store right is more important than customer experience or personalization and some of that in the offline stores. I really think, as a consumer, you will eventually gravitate towards whatever gives you the best experience. I must say China is definitely leading from the front on new retail or the way retail is changing. It’s going to be a lot about personalization and greeting that customer as an individual. I do see that happening a lot.

I think more and more you will start seeing, like we see in China a little bit now, every customer’s journey, very good recommendations on products, a lot of power back to the customer in terms of they themselves being able to search for products in the store on a console. I think a lot of that is what we see on eCommerce sites today. If you have a personalized site, your conversions, your stickiness, it is like probably 30-40% more than someone who is just doing this, put an entry on an online site and that’s it. So I think that same trend is definitely going to happen in the offline world.

Subscribe to our AI in Industry Podcast with your favorite podcast service:


Header Image Credit: Staples

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: