We spoke with Jonathan Ross, CEO and founder of Groq, an AI hardware company, about software defined compute. This interview is part of a series we did in collaboration with Kisaco Research for the AI Hardware Summit happening in Mountain View, California on September 17 and 18.
Software defined compute is a way of thinking about how compute can be optimized for machine learning functions. Ross talks about some of the pros and cons of GPUs and where software defined computer might make its way into future machine learning applications.
Subscribe to our AI in Industry Podcast with your favorite podcast service:
Guest: Jonathan Ross, CEO and co-founder – Groq
Expertise: AI hardware
Brief Recognition: Prior to founding Groq, Ross was a hardware engineer and co-founder of the Tensor Processing Unit at Google, where he worked for four and a half years.
(02:30) Can you define software-defined compute?
Jonathan Ross: Sure. And thanks for asking. The word “software-defined” has been used a bit in hardware and in particular networking. Recently it’s been used by several different companies for describing what they’re doing with accelerators. And the reason they talk about this is there’s a conception that when you’re building custom ASICs for machine learning, that they may not be configurable or they may not be programmable.
What it really means is for machine learning in particular, since it takes quite a long time to build a chip, two to three years, and the machine learning models are changing so rapidly that oftentimes you’re unable to build a chip as quickly as the researchers are coming up with new techniques and new machine learning models.
So to be able to build something that people want to use, you have to make it very flexible. So software-defined really just means that you’re making a device that will be adaptable to what’s coming in the future.
Stepping up a level, one of the things that often happens when new technologies become available is people start to take advantage of what’s available. An example is people used to use very sparse machine learning models. They would deploy them across a large number of servers, true to Amazon, true to a lot of other companies.
What happened was, as they started getting devices that were capable of working with denser compute, which I’m not even going to try and define, but what ended up happening was people started using that denser compute. So when it comes to software defined, this element of flexibility where you can do what you want, you can take these new techniques and you can apply them. But it also means that researchers can experiment and see what they can do with these devices and come up with things themselves.
For example, we’ve seen that GPUs have been used a lot for machine learning. The reason is they have a lot of compute density. But their memory bandwidth is very slow, and this has been a bit of a problem. People thought that this would prevent very expensive machine learning models from continuing to get performance gains as you tweak them.
But what happened was that researchers started to take advantage of that extra compute power. What they would do is they would do a lot more compute per memory access. So in terms of the flexibility, it’s not just that you’re able to support things that the ML researchers have been doing in the past. It’s also that they can explore and make better use of the hardware that you give them.
(06:00) Are there any discrete cases in the business world where this software-defined concept is better than GPUs?
JR: Sure. I’ve got a great example. One of the unique features of the hardware that we’re developing is that it takes advantage of something called batch size one. What that basically means is… Do you remember playing the game 20 questions growing up?
Yeah. So the way the game works is you have 20 questions. Someone has a item in mind, a person in mind, and you ask questions where you get yes or no answers until you figure out what that item is.
Now, I was talking about the density of the compute. One of the things that’s limiting for the hardware today is that to get good use of it, oftentimes you have to run the same program at the same time on many different inputs. Imagine you’re driving down the street and there are three stop signs. But to get really good performance, you really have to run on 64 stop signs in order to identify them and get that good performance.
So if you have only three, it costs you the same as if you had 64. Well, now imagine you’re playing the game 20 questions and you have 64 inputs that you’re trying to guess. Those questions have to be very complicated because you’re not guessing what this one item is, you’re guessing what these 64 items are.
One of the things that’s unique about the hardware that we’re building and the software-defined hardware aspect of it is that it’s not built for any particular model. You can change the kinds of models that run on it and you can take advantage of the smaller batch size by breaking your models apart.
And instead of playing the game of 20 questions on 64 different items at the same time, you can do it on one item at a time, which makes it much less expensive. So now you ask, is it animal, vegetable, or mineral? And the answer isn’t always yes, because you always have an animal, vegetable, and mineral.
Let’s say that you’re trying to make a determination for a potential insurance client. You’re going to have a bit of information about the client, but if you run a model that has to take into account every possible bit of information about that client, then it’s going to be a very large model. But if you can look at a little bit of information you have, like what information you actually do have about the client and then pick a model that’s sort of right sized for that problem, it gets less expensive and it also gets more accurate.
Another example would be if you’re trying to build an autonomous car and you’re driving down the road, you might identify that something is a tree, or you might identify that something is a sign of some sort. You may not know that it’s a stop sign.
When you then are able to run a very particular model on that object, right? So you’ve got maybe 200 objects in the scene, but you have three signs. What it means is you can run a sign classifier on those three signs and it’s very specifically trained just to identify what those signs are.
You can imagine also in strategy when you’re trying to make predictions. For example, the way that the AlphaGo model works. As the game evolves, you can actually use different models, or if you believe that there are several different ways that the game could evolve, you could use several different models, some with a more aggressive playstyle, some with a less aggressive playstyle. What it really does is it allows you to just try a whole bunch of very different things on the same hardware without having to have custom hardware for each of those different things.
(11:30) Where do you see this seeping into industry? Are there some spaces where this concept is getting held onto more tightly, getting adopted a little bit more quickly?
JR: Yeah. We’ve had CPUs and GPUs for a long time, and these were programmable devices. We’ve also had other types of devices like FPJs, which are programmable but not really while the thing’s running. That’s much more difficult. Where it’s getting interesting is machine learning. In particular, the reason is because machine learning is this open target where people don’t know exactly where it’s going to land. It’s changing so quickly.
And then in particular where we’re seeing interest is in places where models are going to change quickly. So if you build a small consumer device, some sort of audio trigger, like a device that opens your door when you say something, that doesn’t have to keep getting smarter.
But when you deploy something in cloud, or at a hyperscaler, or in an autonomous vehicle because no one has a final formula for what those machine learning models are going to look like, they need to be able to update them and make changes.
Additionally, people are still finding new uses for machine learning because it’s such a new field. Basically we’re seeing use cases all over the place in part because even in those places where eventually this stuff will harden and it’ll be fixed, right now no one knows what that’s going to look like.
If you want to look at where machine learning is useful, just look at any place where probabilistic decisions are made or difficult decisions. This is very different from how software has worked historically.
Just ask yourself, where are you making difficult decisions? Then the other thing that machine learning does is it gives you this ability to have these two paradoxical things at the same time. It allows you to have both repeatable performance and it allows you to have creativity. If you have a machine learning model and you apply it to a problem, it will very often solve that problem, but it might solve it in ways that no one thought that it would solve.
(14:00) Do you see software-defined compute as an extension of that same creative capability?
JR: That’s exactly right. Machine learning researchers today are coming up with all sorts of ideas. Oftentimes you’ll see that the ideas are constrained by the hardware that they have. GPUs weren’t designed for machine learning. They’re okay at it because they’re very parallel and they can do a lot of operations, but it’s not how you would design such a device today.
So if you look at a lot of the machine learning models that have become popular, they’ve been optimized for the designs of today’s GPUs. So when you have a flexible device, you can handle those, but you can also handle the other stuff that the researchers are looking into and experimenting with, and it gives them the freedom to come up with new model architectures.
This is particularly important in the early days of machine learning because these things do become very fixed in industry early on. So by flexibility, it allows them to experiment before everything gets hardened and fixed.
Then the other side of this that’s interesting is, historically, there are two types of industries or products. Those that, if you get them to a certain capability, they’re good enough. And then those where there’s just this insatiable thirst for more. Compute is definitely an insatiable thirst. No one has ever in the history of compute said, “I have enough compute. I’m good.”
One of the interesting things is that we’re so early on and people are just now unlocking the ability to do tasks with machine learning that weren’t possible before that we’ve noticed every time there’s an improvement, every time the compute gets cheaper, every time the compute gets faster, every time the compute gets more energy efficient, people actually spend more, not less.
I believe about seven percent of the world’s power actually goes to powering data centers. That’s a really meteoric rise in terms of how much compute has grown in the world and how important it’s become.
So it’s a sort of inflection point where we’re at the beginning of machine learning and every time it improves, people want more of it.
(18:00) Do you think companies like yours will eventually begin specializing in specific sectors and spaces?
JR: One of the reasons that we focused on making such a flexible design was that it also makes it easier to program. In the long term, our hope is that people won’t actually understand that it’s software defined compute underneath. The idea is that if you can adapt the hardware, you can actually make it easier to program it. In fact, we started with the compiler.
I think what’s going to happen in industry in terms of adoption is the first places that are going to use it are the places where they need the absolute greatest performance and the existing architectures, the GPUs, just can’t give it to them. And they’re going to be able to eke out every little last bit of game by tweaking the hardware to fit what they’re trying to do. Over time it’s going to start to percolate through a whole bunch of other industries.
But it’s also interesting because people are actually using this thing called auto ML, which is using machine learning models to design machine learning models. A recent results efficient net, which was a model that was trained by models, outperforms all human designed models.
What that means is if you have a very flexible piece of hardware, the way that people are training these machine learning models or the way that they’re generating them, the structure of them and the shapes, can actually take better advantage of that hardware.
Auto ML is very quickly becoming the norm in machine learning. It’s just that it has a pretty significant advantage when it has the ability to reconfigure the hardware. Then the other interesting part about this is we can actually reconfigure the hardware while it’s running very rapidly. So that allows training of models and it allows the running of models where the hardware actually changes from one portion of the execution to another portion of it. At the beginning it might look one way and at the end it might look another way.
Subscribe to our AI in Industry Podcast with your favorite podcast service:
This article was sponsored by Kisaco Research and was written, edited and published in alignment with our transparent Emerj sponsored content guidelines. Learn more about reaching our AI-focused executive audience on our Emerj advertising page.
Header Image Credit: Dqindia