Episode Summary: As it turns out, survival of the fittest applies as much to algorithms as it does to amoebas, at least when we’re talking about genetic algorithms. While we’ve explored other types of machine learning algorithms in business in past articles, genetic algorithms are newer territory. We recently interviewed Dr. Jay Perret, CTO of Aria Networks, a company that uses genetic algorithm-based technology for solving some of industry’s toughest problems, from optimization of business networks to pinpointing genetic patterns that are correlated with specific diseases.
Dr. Perret has been working for years in this domain, testing genetic algorithms that use variations of parameters in order to gradually arrive at a best solution when there’s no simple way to program one. In this episode, Dr. Perret discusses how genetic algorithms (GA) work, and also illustrates two case studies that convey how they can (and have been) tested and applied in a business and healthcare context.
Expertise: Physics, Computer Software, AI, Developing Start Ups, International Sales and Business Development, Data Analysis, Bio/Chem Informatics, Telecommunication Technologies
Brief Recognition: Dr. Perret co-founded Aria Networks in 2006 to apply the DANI AI engine to network planning and optimization. He joined Aria Networks full time in 2007 as Chief Science Officer and VP of R&D, and in June 2008 also joined the board as an Executive Director and CTO. Previously, Dr. Perret co-founded Applied InSilico in 2003 to apply AI and evolutionary programming techniques to improve the efficiency and automation of the preclinical drug discovery process, including Bio and Chemoinformatics. He was active for 10 years in academic research in the UK at Leeds University and the Bartol Research Institute at the Franklin Institute in the U.S., culminating in several fields trips to the South Pole to conduct his research. He has spent the last 15 years in R&D, hardware and software development, sales and marketing in Europe, Asia, and the U.S. Dr. Perret has a BS in Applied Physics and a PhD in High Energy Astrophysics from Leeds University.
Current Affiliations: Executive Board Director, CTO, VP R&D at Aria Networks
Big Ideas:
1 – Genetic Algorithms (GA) Can Optimize Solutions in Minutes.
Genes are variables in a problem used to optimize a solution, and by feeding these defined variables or “genes” to a learning engine, the GA is able to come up with optimized solutions for a diverse array of industry problems. For example, if your objective is to keep cost of equipment low in designing a network, the GA can integrate provided variables of design, time, etc., and take into account past and future variances while simultaneously simulating thousands of ways that a solution can be designed. What might take humans hundreds or thousands of hours can be done in minutes by a GA.
2 – Iteration Is Key—a GA Learns and Improves Over Time.
A GA learns in the same way that a human might—by integrating multiple examples into a schema over time. Each time a GA is provided a new example or fed important variables (by humans), it can further hone in on better examples or solutions. This is reinforcement learning in action (though GAs can also be used with unsupervised or supervised learning approaches).
Interview Highlights:
The following is a condensed version of the full audio interview, which is available in the above links on Emerj’s SoundCloud and iTunes stations.
(1:45) Give us an overview of what is a genetic algorithm (GA)…and where do they need to be used?
Jay Perret: They come about from trying to replicate what actually happens in nature; evolution takes things that are happening in the environment and tries to optimize them…we do exactly the same thing with genetic algorithms, so it’s essentially a parameter tweaker; any parameters or any controls that can be used to tweak a process, a DNA essentially keeps trying different versions or different combinations of these parameters to get a good answer…
…imagine you have thousands of variables, how do you combine those together? You can’t do it randomly, you can’t try every combination, so you have to a have an intelligent way of selecting the right set of parameters or values to actually solve the problem and that’s where GAs are very good…it’s called an intelligent random search…
(4:37) What’s an example of a closed system where we can optimize and which ones are too open so that we can never get a perfect answer?
JP: So games are a good place to look for examples…tic-tac-toe is a very simple game, you can actually sit down and write all the combinations for every single start point to guarantee a probability, given where you start a particular move, as to what’s the next (best) move….you can actually work out every single combination of moves so you don’t need a GA or AI to solve that particular problem. When you look at chess, there are still a finite number (of moves), but it’s absolutely massive and you can’t look at every single possibility in your lifetime, but it’s quite well formed—so you can use a GA to help search that huge space…
…the next problem was recognizing patterns, that took a little bit more effort, and that’s where AI really came into its own, by showing that where the problem is just too big to be solved in one go, we can attack it in different ways by learning from experience, like GAs do, just like in nature…
…in nature, the things we’re tweaking are the genes; with genetic algorithms, you’re still tweaking genes, but what are your genes? What are the things that describe the problem you’re working on?…It might be the location of routers in a network, or whether you smoke…these are the things that you can play with (in a GA)…you make those genes and then feed it to the GA, and then it works without knowing anything about the problem; you just have to wait to see that, given the particular version of these tweaked parameters, how fit your solution is…
(9:25) Let’s say we’re trying to predict the prices of coastal homes in Europe…how is a GA doing a better job of that initial scrambling, of where we can start on this problem parameter-wise, in order to better predict housing prices than a human ever could?
JP: It’s really down to speed and the way that you learn from experience…a salesman who is just starting in the industry won’t know the relevance of square footage, the location of the building, things like that, but an experienced sales guy would, because over time they have learned what things are good and what things are bad (in relation to real estate prices)…that might have taken him 20 years to do…we can do the same thing with a GA in seconds…
(11:50) I know you recently published a paper with Facebook, and I take it that that might be a useful example for the audience to understand in terms of how you’r using this in a real-world, business context.
JP: Yes…it’s a very complex problem, you’ve got large data centers all around the U.S., which are transferring lots of data in different sizes at different times of day across a network, and there are two layers to the network – a transport and a service layer—how do you design and build that network? Where do you put the transport nodes? Where do you put the IP nodes? Where should you protect data, where should you let the data go unprotected?
There’s lots of parameters…in this instance, we made genes of a lot of those different properties, like what routers should we use, how they should be connected up…and then we provided that to our learning engine, and over a period of seconds to minutes it came up with a network design that wasn’t just good for the data today, it also worked out if the network was to fail…the great thing was that when a human did that optimization, we got a certain value; when the AI system did its optimization using GAs, we got 25% less costs in the IP cost of that network; that’s a reduction of the quarter of the cost…that’s the kind of thing you can affect (with a GA)…
(18:12) So the success metrics, I imagine, are already in there for the algorithm to bounce itself off of; is the term reinforcement learning proper here?
JP: You’re right, and someone who knows a little about AI would probably come up with supervised learning, unsupervised learning and reinforcement learning….reinforcement learning is what we did with FB…it’s really having penalties and rewards for getting closer or further away from the solution, so that’s exactly the right terminology.
(19:17 ) On the last case study we talked about off mic, you talked about an example using the same technology that you had leveraged at Aria in another domain, but still being able to yield some exciting results. Talk about that diagnostics example.
JP: It was a company called Solara Diagnostics…the commercial company that has mapped the human genome…they gave us people’s whole DNA—we had about 250 patients and we had a large number of genes for each of those patients — but also things like how much they drank, whether they were male or female, things like that, and there’s a reason for doing this…the disease was called deep vein thrombosis (DVT), it’s what people can get from taking very long air flights…believe it or not there’s actually a genetic reason for getting that, but also your lifestyle can make you more susceptible to getting that.
So in a blind test, Solara gave us 30,000 genes per patient and a whole lot of phenotype information…and we blindly used our system to discover what is crucial in whether someone got or didn’t get DVT, and we published a paper with them…Solara identified another set of genes that they could use as another clinical test, but you could also make lifestyle choices if necessary to reduce your chances..and there, the objective—this is a supervised learning example, where we’re trying to map these many thousands of parameters to a single parameter, which was, did you get the disease or not? And we could translate the probability…again, the AI doesn’t know what it’s working on, it’s just tweaking its own parameters and trying to get a result that matches…