Artificial intelligence and machine learning have fueled technological innovations in marketing, eCommerce, and several other industries. Many people experience the benefits of AI and ML systems without even knowing it every time they search on Google or click on a song in Spotify.
AI can help recommend movies and music – what’s keeping these same systems from having a discerning taste in wine, or in fine art (such as oil painting or sculpture)?
The challenges of developing artistic or culinary “taste” in machines are in some ways much different than the challenges of recommending movies or music.
The Challenges of Building Artistic or Culinary “Taste” into an AI System
It may seem as though search and recommendation engines have become rather good at understanding a user’s intent as of late. Particularly, recommendation engines can in many cases recommend products to site users mere minutes after first appearing on the site and clicking through a product listing. It’s as if the recommendation engine is able to figure out a user’s taste in music or apparel.
The way Netflix recommends movies, or Amazon recommends products – is very different from how humans recommend things to one another. Machines are as of yet incapable of using what we call “taste” and preference the way humans understand it – they can only achieve a proxy of understanding taste by finding patterns in volumes of data. In some instances (such as movie or music recommendations), these proxies can be incredibly effective and scalable when compared to human preferences.
One could argue that a machine might eventually be capable of appreciating a good Bordeaux, but it isn’t anywhere near capable of doing so in the current state of things; machines can’t “appreciate” anything – in the human sense, and so they are incapable of determining “good” or “bad” wine, “good” or “bad” taste, without feedback from human users. Machines can only determine qualities of taste based on the data they receive from human input.
Furthermore, machines cannot yet determine these qualities with any reasonable accuracy without access to large volumes of quantified, repeatable data. This type of data is not always available. Whenever a situation calls for a subjective assessment of a particular stimulus that is difficult to quantify (in other words, taste), machines will have no basis for making that assessment.
However, AI models can hypothetically be trained to predict taste preferences accurately under specific conditions. In some instances, these proxy determinants of taste are readily available. In others, it requires a great amount of effort on the part of humans to produce the necessary data on which a machine learning model might be trained to understand taste.
Below are some examples of how AI and machine learning models can and cannot currently determine a user’s taste:
Music – How Spotify Determines User Preferences
One concrete example of AI accurately predicting human tastes is in music. The music industry in the US grossed over $7 billion in sales in 2016, 68% of it attributable to subscription streaming music platforms such as Spotify. These music platforms use machine learning to inform their recommendation engines. These recommendation engines provide Spotify listeners with a reason to subscribe by providing them recommendations relevant to their tastes. In turn, listeners might be more willing to purchase the music files themselves.
The reason platforms such as Spotify are so successful in selling music is that they are predicated on personalization. They are able to pinpoint which types of music are likely to appeal to a particular user based on the user’s own history and the engagement of thousands of other users with similar profiles and demographics.
Spotify has volumes of data points from its millions of users that it uses to make relevant recommendations to a particular user. It has quantifiable data about people who listen to certain songs and tend to listen to other songs, which the AI interprets as “liking” or “taste.” The recommendation engine notes how long a certain type of user listens to a song or piece of music, how often they pause or replay a certain piece, and how often they include the song in a playlist.
In a 2017 presentation at DataEngConf in Barcelona, Spotify Data/Backend Engineering Manager Gandalf Hernandez shares some of the process that Spotify uses to determine to recommend music, and to learn from audio tracks:
Spotify – through various means – uses this data as a proxy for “liking,” and when it accumulates enough of that data (song plays, song ratings, adding a song to a playlist, etc), it can accurately predict what a user that listens to a particular song will want to hear next and make recommendations accordingly. Spotify also notes when a particular user rejects a particular recommendation in order to refine future recommendations.
For example, a first-time user of Spotify might choose to listen to just two songs, both of which are pieces by classical music composer Sergei Rachmaninoff. This does not give the AI enough information about the user to make good recommendations. However, the AI behind the recommendation engine does have a significant amount of data from other users that also listen to these two particular Rachmaninoff pieces, and so it looks at what those other users clicked on next to make some recommendations for this new user.
i.e. “Other users from [country] who downloaded Spotify and immediately listened to Rachmaninoff typically ended up liking [X] other songs. Let’s recommend [X] other songs as low-hanging fruit to engage this user.”
When the new user clicks on the next song, the AI will use this additional data point to refine future recommendations for this particular user. Over time, the AI behind the recommendation engine will become better at “understanding” this user’s taste in music or at least better at recommending them music.
The key to this so-called “understanding” is the availability of large volumes of data that can be fed into the machine learning model behind the recommendation engine. The AI is not truly inferring or supposing the type of music likely to fall within a user’s taste; it uses the data provided by users based on response reaction and behaviors to make this determination. It’s method of “understanding” is about probabilities and the assessment of various proxies for “liking” music from millions and millions of users.
The capacity for the algorithms that make up the machine learning model to process this data plays a major role in the ability of the model to make predictions and recommendations. These algorithms tend to change constantly because dozens of data scientists are often working on them at any given time.
However, even the best algorithms are useless without the data through which they can run. Fortunately for the music industry, that data is readily available by way of streaming music services. Our sense of hearing translates relatively well into the digital world. That is not the case for a few of our other senses and for the aspects of taste related to them.
Wine – Sensory Taste and AI
One could argue that if a machine can be trained to “understand” or at least recommend products based on a user’s taste when it comes to music, it should be possible to train it to do the same when it comes to wine. Hypothetically, yes, the same process by which a machine learns how to identify a user’s taste in music applies to a user’s taste in wine. However, sampling music is not the same as sampling wine. They involve two different senses, and physical taste does not lend itself to digitization.
When a user clicks on an audio file, he or she can experience it in a purely digital environment and log a reaction by either listening to the music in its entirety, replaying it, choosing a similar piece of music next, or abandoning it mid-play and possibly skipping to something entirely different. All of these data points are captured digitally, and these are immediately available for processing by a machine learning model.
Quantifying how people decide if they like a particular wine or not is not as easy as predicting the type of music a specific user will like. The tongue doesn’t lend itself to digitization in the same way that the ears do. The engagement required to gather data for wine happens in the physical world, which is not immediately available to machine learning models that exist entirely in a digital space.
If one wants to determine proxies for a user’s taste in wine on which a machine learning model might be trained, they would need to gather large volumes of data in the physical world in which a machine learning model could find patterns.
Also, a taste for wine is notoriously subjective. A machine learning model does not have taste buds, and so it can never understand sensory taste information the way humans can. As such, it is much more difficult for it to determine a user’s preference, or taste, for wine. It can only assign values to a particular attribute of a wine based on its chemical composition and the perceived value of each characteristic that makes it a “good” wine as defined by human input. For a machine learning model to make accurate predictions on what type of wine is likely to appeal to people, the data scientists building the model would need to somehow gather quantifiable data related to wine preferences. Since this data would need to be collected in the physical world, it may not be impossible, but it could be exceedingly difficult.
Quantifiable factors might include:
- Chemical compounds found in wine
- The relative amount (parts per milliliter) of various compounds in the wine
- The color of the wine
- The viscosity of win
- The type of wine (Bordeau, Zinfandel, etc)
Even more challenging would be the distillation of actual labels of human “taste” in human perception, such as:
- “Tart, but with a smooth finish”
In order to actually distill these subjective experiences into qualities that could be used reliably to recommend wine, a tremendous amount of controlled taste testing would have to occur with people of various taste preferences, and the quantifiable factors within wine (its chemical makeup, its color, etc) would have to be reliably “mapped” on top of these subjective reported experiences from thousands (or tens of thousands) of humans.
There are other challenges. Having humans listen to 100 songs in a day is perfectly reasonable. Doing the same with 100 glasses of wine would mean a trip to the emergency room. Things are further complicated with the presence of food – as a reliable system might need to be trained with the same wines – but with dozens of different food dishes (from cheese platters to fish filets, and more).
It might require a highly instrumented, controlled environment in various locations, a large number of participants, and several wine tasting sessions. These participants would be asked to taste several wines chosen at random. Each of these wines would be analyzed beforehand to get their chemical composition, and selected properties would be assigned a coded quality.
Each participant would then be asked to rate each wine based on these qualities, such as pH, bouquet, and sweetness. Based on the ratings, the participants would judge the wine on a one to 10 Likert scale. The scale would quantify the participants’ judgments, and, theoretically, a machine learning model could use these judgments to determine what might constitute “good” and “bad” wine.
This kind of data collection method would require substantial amounts of time, money, and effort. The amount of data required to train a machine to make reasonably accurate recommendations for wine would take months. It would take Spotify just 30 minutes to collect the same amount of data.
That said, Likert scale judgments are subjective and often whimsical. The same person might judge a wine a “5” one day and a “7” the next depending on their mood. Another way to go about it might involve more objective measures. Amazon, for example, factors purchase history heavily into its recommendation engine. If a participant were to rate a wine a “7” but then buy 3 bottles of it at the wine tasting, the machine learning model could weight the purchase as a greater proxy for “liking” than it does the participant’s judgment. It’s likely that someone who purchases 3 bottles of a particular wine thinks that wine is in some way “good.”
Because of these challenges, wine recommendations today are more likely to be derrived from purchase data than from any robust assessment of the chemical makeup of wine, or the “mapping” of that chemical makeup onto human subjective terms like “smooth” or “fruity”. Purchase data is easy. Taste bud data is hard.
When all is said and done, an online shopper might be able to view the product page of a bottle of wine and be recommended other wines. The machine learning model behind the recommendation engine on the page might be able to determine that participants at the wine tasting that rated the wine the shopper is viewing highly were also likely to rate another wine highly. The recommendation engine would then show that other wine to the shopper.
Visual Art – How AI Models Understand Images
Recently, Google announced experiments with AI that sought to take an image and edit it to make it more pleasing to viewers. According to Google AI, the machine learning “mimics the workflow of a professional photographer, roaming landscape panoramas from Google Street View and searching for the best composition, then carrying out various post-processing operations to create an aesthetically pleasing image.”
However, it is important to note that “aesthetically pleasing” is a highly subjective concept. What may be pleasing to some people may not be for others. It all comes down to context. An AI may be able to mimic the quality and work of a professional photographer for a particular use, such as advertising a ski resort, but people would not like to see that advertisement in a museum. The context in which a particular image exists affects whether or not people like it.
From an artistic perspective, visual media is as difficult to judge as wine. In order to teach an AI to acquire a taste for the visual arts, it would be necessary to break it down to quantifiable proxies that the machine can “understand.” This could be, done by assigning values to a combination of colors, shapes, gestures, and other visual elements and getting people to again rate the image in which those combinations exist on a Likert scale.
Fortunately, gathering visual data is much easier than gathering data for sensory taste because sight translates well into the digital world. Google Images, Facebook, and other visual media platforms receive millions of data points on images down to the specific colors in specific pixels in those images.
Machine learning models can consume these data points and use them to make predictions on how a particular user might respond to this pixel data—pixel data which the human brain interprets as an image. A machine learning model could, for example, determine that people in a particular county in California are more likely to respond to images that predominantly involve the color blue more than people from other parts of the US. This determination could inform a recommendation engine that seeks to recommend products to people in that country. It could also inform marketing campaigns targeting people in that country with visual advertisements.
The AI doesn’t truly understand that people from that county prefer blue images, but it predicts their likely response to those images based on data, numbers, statistics.
The Current State of AI for Understanding Taste
Machines do not have sentience. As such, they are as of now both incapable of having their own preferences and tastes and incapable of truly understanding the preferences and tastes of the humans that use them. They very well might be able to do this in the future, but that is a relatively far-off prospect.
For now, they rely on data to make predictions and determine likelihoods, imitating the way people “understand” each other’s preferences without truly grasping the full breadth of understanding capable of a human brain. In time, these statistical machine learning methods may extend to the chemical senses (taste and smell), but there are inherrent challenges in doing so, which we’ve tried to highlight in this article.
Header Image Credit: Republic, Washington