Episode Summary: Natural language processing (NLP) has become popular in the past two years as more businesses processes implement this technology in different niches. In inviting our guest today, we want to know specifically which industries, businesses or processes NLP could be leveraged to learn from activity logs.
For instance, we aim to understand how car companies can extract insights from the incident reports they receive from individual users or dealerships, whether it is a report related to manufacturing, service or weather.
In the same manner, how can insights be gleaned from the banking or insurance industries based on activity logs? We speak with the University of Texas’s Dr. Bruce Porter to discover the current and future use-cases of NLP in customer feedback.
Subscribe to our AI in Industry Podcast with your favorite podcast service:
Guest: Dr. Bruce Porter, Professor of Computer Science in University of Texas, Austin; Chief Science Officer at SparkCognition
Expertise: Machine reading, natural language processing
Brief Recognition: As SparkCognition’s Chief Scientist, Dr. Bruce Porter leads the company’s research and development initiatives. He was a two-time chair of the University of Texas, Austin Computer Science department, recently returning to his role as a professor at the university to focus on teaching and research. In 2017, the Austin Chamber of Commerce recognized Dr. Porter, with the Economic Development Volunteer of the Year award for his work in recruiting technology companies to build and innovate throughout the economy across the Austin region.
Big Idea
Many companies and government agencies are deluged with incident reports, customer logs, and information coming in the form of text. To gather insight from these logs, Dr. Porter coined the term “macroreading”, which refers to the detection of patterns in huge masses of unstructured text.
Example 1: Improving Customer Experience in the Auto Industry
In the automobile industry, imagine that a car company receives incident reports on a daily basis car owners or car dealerships. These reports consist of a paragraph or two of text describing a problem that the customer has experienced with a particular car. These incident logs have unstructured information. They come in the form of text, diagrams or pictures. They can also include metadata such as the occasion and other incident details that are structured, Dr. Porter clarifies.
The question is: Can the car company mine the text reports to find patterns at the macrolevel and discover what is happening with the cars in a particular model and year? Can this information help diagnose important problems, or detect trends that might help the car company improve its products?
Data entry might vary widely across different parts of the automotive industry. Car incident reports could be typed if they are received by someone in the office. Technicians in the field could be using audio devices to report a problem. A business with the ability to find patterns and across all of these different data types would be better prepared to find and address problems and opportunities quickly.
Example 2: Preventing Financial Fraud and Money Laundering
Dr. Porter brings us to the financial sector for his second example. He explains that wire transfer reports come with meta information such as the certain amount of money being moved, as well as the source and destination of the wire transfer. The report also contains text information about the nature of the wire transfer and the relationship of the money sender to the bank. In this, a macroreading application has the potential to uncover fraud and money laundering activities.
Dr. Porter further explains that developing a banking application for the government can be challenging, primarily because such an application requires a large quantity of data. That the application could potentially be used as an investigative tool requires it to be a deep, robust system. The application must have the capability to show interrelated actions and participants over time, which when taken together reveal a pattern of suspicious behavior.
Citing money laundering as an example, Dr. Porter explains that one pattern shows a small business such as a car wash or a laundromat collecting cash, and then wire transferring large sums of money through accounts owned by these small businesses (with the intention of “flying under the radar” of authorities by transferring relatively small amounts over time).
Dr. Porter forecasts that industries that would need this kind of insight in the next five to 10 years would be those with significant investments in equipment that is distributed globally. The company would be receiving reports on a regular basis, either hourly or daily, of how that equipment is performing. The challenge for the company is detecting failures early before they get out of hand and meeting the regulatory obligations for large industries.
A Look to the Future of NLP
Over the next 10 years, Dr. Porter reveals that his research at the UT, Austin university will focus on microreading, which aims to extract data from newswire feeds, internet postings, and social media. He explains microreading does not involve finding patterns from thousands of logs, but from understanding one particular post or document about, for instance, the failure of an automobile. He is optimistic that the microreading process can be implemented in AI within five to 10 years due to the improvements in fundamental NLP technologies because of machine learning.
One other focus of Dr. Porter’s research is long-lived AI systems, which he envisions to have a 10-year lifespan as its machine learning capabilities improve over time.
“Imagine a microreadig AI system that comes across a passage of text that it can’t decipher. It could leave in its knowledge base a reminder, revert to it a year from now when it has more background knowledge and the technologies have gotten better, and try again purse and understand this difficult text,” Dr. Porter cites, adding that long-lived systems can be more ambitious because of the time factor.
Long-lived systems will need to acquire an enormous amount of background knowledge to perform microreading successfully, he relates, adding that in his 35 years of experience, much of the work entailed manually constructing the massive knowledge bases that support the research.
“I still believe that we need to prime these systems with some level of general knowledge of the world for them to be automatic.” That is an awfully big challenge in and of itself, and it’s a problem that most NLP vendors haven’t focused on. It’s unclear how “general knowledge” will make its way into niche NLP applications in banking, customer service, eCommerce, etc.
Interview Insights from Dr. Bruce Porter from SparkCognition
The main questions Dr. Porter answered on this topic are listed below. Listeners can use the embedded podcast player at the top of this post to jump ahead to sections they might be interested in:
- (3:32) What are the possibilities of using natural language processing? What are some of the important applications on your mind?
- (5:55) How do the data or reports processed by the machine learning applications enter the systems?
(6:45) At government agencies, where would the data be entered? Would they have equipment logs or some kind of log? - (11:50) How would technologies like these look like in the future? Would metarreading become the norm? Are there spaces that would need this kind of insight in the next half-decade?
- (13:50) There is still a lot of human intervention needed to find insight from metareading. How do you think it will become easier? How do you think that transition will happen?
- (17:45) If we want a system to get human-level insight from individual instances, do we need to infrastructure where humans would interact enough with the instances in a structured fashion to put in context?
Subscribe to our AI in Industry Podcast with your favorite podcast service:
Header image credit: Adobe Stock