Facebook’s Jason Sundram on How to Build Effective Data Science Teams

How to Build Data Science Teams for AI Projects in the Enterprise

This week we interview a leader at Facebook. Jason Sundram is the lead of World.ai at Facebook, which is one of their efforts to work with public data around roads and population and other projects of that kind. But Sundram is also highly involved in the Boston office here, where Facebook will soon have around 650 employees. Many of them focus on data science and artificial intelligence.

Last time we talked about personalization in AI with Hussein Mehanna, who was Director of Engineering at Facebook at the time. This time, we’ll talk about two topics that all established sectors need to be focusing on:

How does one build ML and data science teams?
How does one pick an AI project?

For business leaders who are considering hiring data science talent or thinking about how to start with AI in terms of making a difference in their bottom line, this should be a useful episode.

(Note: Reader with an interest in the Massachusetts AI scene should read our full article on the Artificial Intelligence Ecosystem in Boston.)

Subscribe to our AI in Industry Podcast with your favorite podcast service:

Guest: Jason Sundram, Lead – World.ai, Facebook

Expertise: Data science, machine learning

Brief Recognition: Prior to Facebook, Sundram was a staff data scientist at Paypal and a data science advisor at UC Berkeley.

Interview Highlights

Jason Sundram Facebook - Dan Faggella Emerj — Our CEO Dan Faggella (left) at Facebook’s new Boston office with Jason Sundram, Lead of World.AI for Facebook

(04:30) When you think about building an AI team for an express business purpose, what goes through the minds of the folks who do that well?

JS: I think there’s a lot of ingredients. I mean, I think as you point out, there’s a lot of focus given to the people who have the PhDs, and for sure they’re a crucial part of the equation. But really having somebody who represents the stakeholders so that you can really figure out how are you going to deliver something, that’s really going to be valuable is important. And there’s a bunch of other skills that become super important, especially at the scale that we’re at.

I think it was said probably five or ten years ago about data science that 80% of the time is cleaning the data, 20% of the time is the cool science stuff. I think that equation doesn’t really change much with AI. What you do is, you still need data to train on, making sure that data is clean or clean-ish, is an issue. Figuring out what data you should be using is also an issue. Getting labels is also an issue.

And at the skill that you need to do that, there’s a bunch of infrastructure you have to build. So when you’re thinking about making a team, you need to make sure that there’s some really strong engineers who understand how to move data around at the scale that you need to move it out and people who can be really comfortable at all levels of the stack so that whether you’re having a low-level issue with the networking card in your machine or if you’re having issues of figuring out, “well is my model overfitting?” you can kind of visualize things and kind of work out of an incredibly low level. Those sets of expertise don’t always live inside the same people. So team composition is definitely something you think about.

(06:30) What are the kind of thoughts that go through to make the decisions as to who fits where?

JS: I always look at the product manager as the person who speaks for the stakeholders. So making sure that you have a strong product manager who has spent time with the folks who are going to be the ultimate users of these tools to figure out, you know, what is it we should actually be delivering what’s the business problem we’re actually trying to solve, making up a really crisp articulation of what that is. He wants the engineers to spend a lot of time with that person. And to the extent possible, also with the customer so they can really internalize those needs.

So they understand that what they’re doing is really solving a problem. That’s a business problem, not necessarily just a modeling problem. Once you have that product manager in place, you really need to make sure that you have, as I said, this connection to infrastructure where necessary. And also resources from the business to sort of spend the time getting labeled data that you need to make really great models.

I think you definitely need that strong product and customer voice in the room. If that’s an official product manager role, or if that’s just maybe one of your engineers…kind of ha[s] a product manager hat they can put on. But you definitely need to make sure that voice is in the room.

When you think about delivering really great engineering products, you think about your iterations time. So you want to be able to move as fast as possible by refining what you’re doing. And when I think about a product manager is because you then have the customer in the room you’re able to make decisions much more quickly and move faster.

(09:00) Do we find a regular way to loop people in online? What does that look like?

JS: I think depending on what you’re building and the scale of what you’re going that could look very different. So it could be, you know, doing a bunch of research to understand things where you talk to individuals. It could be looking at a bunch of data for how people are using an existing product and how you want to sort of streamline some of those use cases. You know, if you’re working in a new area, tapping into an existing community, there’s often strong voices that you can kind of be in touch with.

(11:00) What does that look like to assess the technical ecosystem that we exist in and then fit folks that can integrate it in some way?

JS: I think a lot of it has to do with just the inner workings of your company’s infrastructure, so that, you know, when you are running your machine learning models, there’s a bunch of maybe cluster management software you might have to be familiar with the libraries that you’re using, there may be some real minutia in there that you need to know about, that kind of expertise. But, you know, fundamentally, when the models that you build hit production, you have to be familiar with your production infrastructure or you have to have somebody who understands enough about your models to sort of put them in production for you.

I think once you’ve proven there’s some business value from the models that you’re building, you have a real lever to kind of get production folks interested in shipping those models. Right? So it’s sort of one of these “chicken or the egg” situations. Where it’s like, “Well, I’ve got this amazing pipeline where I could plug in the model, but I don’t have a model,” or “I have this amazing model. I have no way to connect to users.” I think once you’ve got this sort of model that performs super well there becomes a lot of incentive and excitement to sort of connect this to production.

There’s obviously a lot of excitement in just doing modeling work. Right? You know, solving interesting technical problems is, you know, one of the reasons people become good at that is because they’re excited by it. However, I think one of the reasons that people might leave a place would be that they are not able to connect the things they’re excited about to real business value. And so they’re basically seen as sort of peripheral to the business.

(14:30) How would a smart executive team go about [deciding where to implement AI] with limited resources and time?

JS: I think the way we try to solve these kinds of problems here, which seems to work fairly well, is sort of solving this problem maybe in two ways. So from an executive perspective, you know, communicating what are the real pain points to the teams who actually have the skills is a great way to get some suggestions about what might be effective. I think putting a lot of autonomy and trust in the teams who are building functionality, or have the expertise to shop around the company and say like “We think we can solve these problems,” I think can be both really empowering for those people and help you focus on what’s really shippable.

I do think that it’s important though to do a good job of communicating what would be impactful for the company. So really think about, you know, if we could do anything, here’s what we would want to achieve.

I think there are different levels of skill of doing this. So you can imagine if you have a good problem, figuring out how to embed a product manager in the team is maybe not so bad. But if you’re really just trying to, you know, go from zero to one, and get a new initiative off the ground, you know, you don’t necessarily know what the problem is. So you’re not necessarily sure who the product manager even is.

And you may have hired some research engineers who have the chops to do some amazing work when you’re not exactly sure, you know, how to motivate them, where to point them. And so I think, you know, getting off the ground is certainly a challenge. I think my solution to this is always just talking about the pain and talking about the business value and sort of see if you can find some kind of residences there.

I also think the role of managers on these teams is super important because those are the people who kind of understand both the business and the technology. And so they’re able to draw some of these lines and help their team kind of refine their understanding of what’s technically possible. So, if you have a strong manager of an AI team, they should be able to sort of be in tune with what the business needs and sort of what is technically possible.

(19:30) Could you give us a high-level overview on World.ai and we’ll talk about some of the neat stuff it’s allowing?

JS: Maps and Facebook are not necessarily the connection that most people draw when they think about Facebook. But maps do actually show up in a bunch of our product services. So whether you’re sharing your live location with a friend over messenger or you’re checking into a place to see a map and our products or you’re asking for recommendations and a map pops up and, you know, your friends are able to say, “Go to this restaurant.” Or, you know, “Here’s a plumber I recommend.” And all those things will pop up on a map.

Those maps on Facebook are powered by basically the Wikipedia of maps, which is Open Street Map. We started our projects in this office to say, “Okay, Open Street Map is great, but there are places where we wish its coverage was better. Are there ways we can contribute to this amazing open data source that will benefit both us and the rest of the world?”

We started by looking at incredibly high-resolution satellite imagery. So these are like 30-centimeter pixels, which are basically the size of your laptop seen from space. And what we were able to do was actually make road predictions. So look for entire road networks and then post process those predictions into contributions to this Open Street Map.

I think one of the things that’s been really amazing for me is the impact of this work not internally, but externally. So when there have been humanitarian efforts that needed to be mobilized because of a natural disaster, using our AI mapping technology, we’re able to map out road networks in places so that aid agencies can actually deliver relief to the people who really need it. Without road networks, you don’t really know how to get to people. Where they live, where their houses are, what are the routes. It’s just logistics are a real nightmare. And these are problems that need to be solved really fast.

Subscribe to our AI in Industry Podcast with your favorite podcast service:

Header Image Credit: Medium

Facebook’s Jason Sundram on How to Build Effective Data Science Teams

Interview Highlights

Services

Resources

Company

Interview Highlights

Related Posts

Related posts (5)

How Existing Businesses Should Organize Their Data Assets for AI

How AI and Data Science Could Better Inform Public Policy Decisions

Building and Retaining a Data Science Team

Insights on the Symbiotic Relationship Between Data Science and Industry – A Conversation with Lukas Biewald

The Business Value of Unstructured Data with Loop AI Labs’ Patrick Ehlen

Services

Resources

Company

Stay Ahead of the Machine Learning Curve