Episode Summary: Emerj has had a number of past guests who have talked about neural networks and machine learning, but Dr. Pieter Mosterman speaks in-depth about the pendulum swing in this approach to AI from the 1960s to today. What we call neural networks as a general approach to developing AI has come in and out of favor two or three times in the last 50+ years. In this episode, Dr. Pieter Mosterman speaks about the shift in this approach and why neural networks have gone in and out of favor, as well as where the pendulum may take us in the not-too-distant future.
Guest: Pieter Mosterman
Expertise: Computer Science and Engineering
Recognition in Brief: Pieter Mosterman is a Senior Research Scientist at MathWorks in Natick, MA, where he works on computational modeling, simulation, and code generation technologies. He is an Adjunct Professor at the School of Computer Science of McGill University. Before, he was a Research Associate at the German Aerospace Center (DLR) in Oberpfaffenhofen. Pieter holds a Ph.D in Electrical and Computer Engineering from Vanderbilt University in Nashville, TN, and an MS in Electrical Engineering from the University of Twente, Netherlands. His primary research interests are in Computer Automated Multiparadigm Modeling (CAMPaM) with principal applications in design automation, training systems, and fault detection, isolation, and reconfiguration.
Dr. Mosterman designed the Electronics Laboratory Simulator that was nominated for The Computerworld Smithsonian Award by Microsoft in 1994. In 2003, he was awarded the IMechE Donald Julius Groen Prize for his paper on the hybrid bond graph modeling and simulation environment HYBRSIM.
Current Affiliations: McGill University; Editorial Advisory Board member of SIMULATION; CRC Press Series Editor for books on Computational Analysis, Synthesis, and Design of Dynamic Systems
Neural Networks, In and Out of Style
Neural nets were a big deal in artificial intelligence research in the 1960s. In the 1970s, they went out of fashion, only to emerge on the scene once again in the 1980s before slowing to a crawl. Today, neural networks have made a comeback and are on the rise. Why the most recent upwards swing? Two words, one concept: big data. “We’re very much trying to find correlated patterns, trying to draw inferences, there’s some exciting research going on in the field,” said Dr. Pieter Mosterman, whose research at MathWorks centers on the use of software and algorithms for data analysis, data signal processing, and other areas. Those who keep up with AI trends and developments are likely to agree that there seems a new breakthrough related to neural networks published every week, with Google’s AlphaGo software being one of the most recent in a line of increasingly powerful systems.
Back in the 1980s as an undergraduate, Mosterman studied neural networks out of a text authored by Geoffrey Hinton, a researcher at Google and professor at the University of Toronto. At the time, the model of neural networks was known as “back propagation”, an algorithmic approach that grew out of the original idea of the “perceptron” in the 1960s. Pieter describes the perceptron model as relatively straight forward and based on how scientists thought neurons operated in the brain, receiving inputs above and below a certain threshold and firing a “message” when triggered. “It was exciting at the time, but Minsky published a paper that showed you cannot learn exclusively with one perceptron (in the late 60s) and this killed interest in the field for the time being,” said Pieter.
The concept of a multilayered neural network took hold in the 1980s, which required researchers to take machine output and then modify the parameters within the multiple network layers (known as back propagation networks) so that the machine actually learned. This approach overcame a fundamental limitation of the perceptron and gave a temporary boost to neural nets, which withered a second time when people found that they were good in certain applications but not so great in others.
For example, back propagation algorithms were particularly good when it came to maintenance of machines like helicopters; scientists “could learn” a neural net to predict whether a helicopter rotor was about to fail, just by having the machine listen to its sound patterns. By contrast, an active area of study but a more challenging application included trying to compute the inverse dynamics of a robot arm, a task that neural nets alone were not equipped to do well. When the techniques were not as all-encompassing as many researchers first hoped, the technology was once again put on a back burner.
This pattern illuminates the reality that different approaches to AI seem to work better in certain circumstances or contexts, but it also is illustrative of technologies that are introduced before their prime time. The latter idea is linked to Pieter’s proposed reason behind neural nets’ recent and successful comeback.
“Neural networks can do very well in predictions based on either lots of data or a good understanding of principles; neural nets work really well with the first (lots of data)…and are now back again because we have all this data available.”
He mentions that after Minsky’s paper was published in the late 1960s, some researchers started thinking about how they might be able to build an expert system, one that could formulate theories and infer from facts. There were earlier examples that emerged in the 1970s, including STRIPS, an automated planner, and MYCIN, an early diagnostic system, both of which were developed at Stanford. This was AI based on inferencing and theories, but one of the primary problems was its increasing complexity. Inferencing chains were long and computational power was challenging. In addition, expert systems could give the right diagnosis but they were not good at explaining how they had arrived at such a solution; a physician wants (and needs) to know why the machine has given a particular diagnosis, even if it’s correct.
So researchers began to lean in the direction of qualitative reasoning, intent on finding ways for machines to trace back causal relations in a specific problem demonstrating an identified effect. All the while, they were faced with the new challenge of more spurious machine behavior.
Today’s thriving big data scene is at the tail end of this latest push, in which machine learning is starting to prevail in analyzing and finding patterns amidst an overwhelming amount of data, in everything from predictive policing to testing for dementia (as relayed in this Huffington Post article by Adi Gaskell). Yet scientists still don’t have an understanding of how much of these algorithms actually work once they begin processing and making sense of information. Furthermore, the scope of many AI programs is still rather narrow and focused on more specialized tasks. A computer that can explain its exact reasoning to human beings is still, at least for now, science fiction.
A Multidisciplinary Approach to Strong AI
Mosterman believes we need a layered architecture to build a truly intelligent machine, one with matching cognitive abilities as well as a sense of consciousness, a difficult concept to instill in another entity when we’re unable to comprehend what consciousness is on a scientific basis. Selmer Bringsjord of Rensselaer Polytechnic Institute in New York ran an interesting experiment last July that challenged three robots to figure out (on their own) which one had not been given a “dumbing pill”; eventually, the robot that could still speak was able to identify itself as the unaffected robot, a feat that showed a level of self-awareness previously not attained by AI. Many scientists are trying to replicate this experiment, an incubation-like step for increasing consciouness in machines.
What researchers do know is that people work based on a layered-function system. Humans first experience sensations – a sound triggers our inner ear and formulates an electrical pulse, for example. These sounds are then transformed into perceptions. Mosterman stated,
“Our minds are liberal in creating a relationship between that sensation and perception and will complete it if obstructed…and out of this comes cognition, and then knowledge; the correlation part is what’s used in sensation-perception layers, while knowledge is higher up.”
He poses an interesting thought experiment illustrating why layers are so important. If you look at a tree and an apple falls out because the wind is blowing, you form an association between the wind and the apple falling out of the tree. But if it’s a hot day with no breeze, you might try to shake a tree to get wind; this hits on the correlation between the tree moving and the force of wind, but there is no logical cause and effect. Being able to take associations and apply them to differing contexts is difficult to capture, which is why layers – built from physical principles, theories, and rules of inferencing mechanisms – are needed in a consolidated entity.
“What you see (today) is a lot of progress in moving one of the layers forward,” says Pieter, but what’s needed is a concerted effort in to build a braided, multidisciplinary system of AI, an idea echoed by previous guest Dr. Pei Wang. AI is a broad field, with many moving parts. “Those excited about neural nets and big data generally are not excited about reasoning systems, generally at the level of expertise required to move the field forward,” says Mosterman.
Large organizations like Google may be at an advantage to further this ultimate goal, having more of the resources and capital necessary to bring together researchers from different areas, so that they can start to combine methods. In reality, it may take many concerted efforts between companies and universities, opening their doors to the other and combining forces, in order to achieve the creation of an entity that has human-level intelligence.
Image credit: McGill University