How AI and Data Science Could Better Inform Public Policy Decisions

Episode Summary: One of the promises of artificial intelligence is aiding humans in making smarter decisions. Whether it’s in pharma, retail, or eCommerce companies, the idea of being able to pool together streams of data and coax out the insights that would help make the best call for the organization to reach its goals is the promise of artificial intelligence. As it turns out that same dynamic is sort of happening in the public sector where AI is now being used to inform policy.

This week we interview Professor Joan Peckham at the University of Rhode Island. Previously, she was Program Director at the National Science Foundation. PhD in computer science and she runs the Data Science Initiatives at URI.

The University of Rhode Island is home to DataSpark, an organization that helps policymakers inform the decisions that they’re going to make about the economy, the environment, the opioid crisis, a variety of social issues, based on deeper assessments of the data.

The ability to find objective insights might help policymakers make better decisions about where they allocate budget and what decisions are made. Right now, policymakers are beginning to tune into artificial intelligence as a source of informing their decisions. The same dynamic will likely play out in the C-suite, particularly when the data is actually there.

Subscribe to our AI in Industry Podcast with your favorite podcast service:

Guest: Joan Peckham, Professor – University of Rhode Island

Expertise: Data science

Brief Recognition: Peckham earned her PhD in Computer Science from the University of Connecticut. She was the Program Director at the National Science Foundation between 2008 and 2011. Peckham is also currently the Coordinator of Data Science Initiatives and Data Science Programs at the University of Rhode Island.

Interview Highlights

(03:30) How would you describe where data science fits into policy?

Joan Peckham: It’s very important to policymakers today to try to make decisions that are based on data and not just a sense of what people believe should be done. I think that many are feeling frustrated that they don’t have access to this information. So data science can help them to compile data and link data in ways that we weren’t able to do before and make reasonable decisions about these policies. It doesn’t mean that data science tells them what the policy should be. It means that they are being provided with sources of information that will help them go to the experts in education, in healthcare and so on, and given the data, make the most reasonable policy changes or recommendations.

I think being human is the key. Psychologists tell us that we frequently make decisions with our guts. We decide before we look at the evidence sometimes. If left unfettered to make these decisions, we might go out and glean the environment to find the data that supports our suppositions.

So what we really need to do is to put into place procedures and strategies for making use of data. I’m on the data analysis side as well the [side for] interfac[ing] with a human being, where the human beings are drawing conclusions from the data. It’s one of the key factors in good data science education, for example, to really consider the ethical implications. One of my favorite books is Cathy O’Neil’s book, Weapons of Math Destruction.

She talks about many of these things. I don’t think the conclusion is we shouldn’t be using big data or artificial intelligence, but we have to understand our limitations. Artificial intelligence was developed in order to have the machine make decisions as well as human beings do, but we’re not there yet.

I don’t know if we ever will get there, where the machine will replace the human being, but we have to learn how to synergistically work with the algorithms and the machine. The human in the loop is going to be extremely important, but we, left to our own devices, have to be careful too. We need to consider the ethical implications of what we are doing and understand our own psychology.

(08:00) How do we shape the incentives from raw biases or business or political interest as much as possible? Where does that barrier happen? What do you think could be improved there?

JP: I think it’s education. When I was at NSF and working on computational thinking, we finally decided that everyone needs to know a little bit of computer science. Well, I think everyone is going to need to know also a little bit of data science and artificial intelligence. We all need to know enough so that we know when to bring an expert to the table and how to bring people from different disciplines that might have knowledge that could help us to make decisions based on data.

I can give you an example from computer science where we just had one of our industrial partners who’s doing quite a bit of software engineering and data science in their company, and one of the things that they do is that when you are designing an experiment or you’re designing software, there are certain forms that you fill out to capture what it is that you’re going to be doing, whether it’s data analysis or developing software.

There’s one question they always have on these forms which is, “What could be the possible ethical implications?” Let’s have a discussion before or as we’re developing the software instead of what we frequently do today is make use of the software or the analysis techniques and begin to see the harm we’re doing and then we back off. We have to develop, create laws and policies and so on. We need to educate all of our technologists to begin to think about that as we’re developing.

The other aspect is the knowledge about the tools being used. So that if someone is drawing conclusions based on what a machine learning algorithm is telling them or statistical analysis technique is telling them, they should know enough about those techniques to understand the appropriate use.

What kind of data sets will this give you a viable answer? A reliable answer. And what sorts of data sets should we not be using these techniques on? What are the weaknesses and strengths of each of these algorithms? People program algorithms, which means the algorithms have shown to have certain biases, which are a reflection of the people who are, and the data about people that were using. I think that probably we have to be careful about what we’re doing. You probably read about the algorithms are being used to sort through applications for jobs in the technical field.

Using artificial intelligence algorithms to do that, but what they found is that some of these algorithms were biased against underrepresented populations and minorities. And that’s because the data has those prejudices as well because humans have been behaving in that way for a long time. So we’ve imbued the data as well as the algorithms with these prejudices that we have to be very careful about. People need to be knowledgeable that the algorithm is not an oracle.

This is the foundation of the scientific method, that you develop a hypothesis looking at the trends that you see and then you test it out using a well-designed study. And then you publish the results and the rest of the community is supposed to be questioning those results.

(15:00) What are a few snippets that’ll kind of give us a taste of where data science is beginning to wiggle its way into policy?

JP: Well in our state the agencies are very interested in education and workforce, for example. How, you know, what you do when students are in third or fourth grade may impact their outcomes in post-secondary school. And then what is the trajectory for these students in the workforce? And that also brings in data from the Health Department. In other words, if we notice that students are exposed to certain things in their environment or if they have certain health record, how does this affect what happens as they go through school and then in the workforce as well becoming working citizens of the state.

One that I noticed from, and I think this is a really good example from Massachusetts, is a group that looked at opiate addiction. And this was not just looking at the data with regard to incarceration and law enforcement. It was also had to do with health and other factors. I just read in the newspaper the overdose rate after incarceration by 60% by providing treatment for people when they’re released from prison.

I mean that, now that’s a very simple analysis, but there are other things that we might be able to do. The cost of healthcare. I have heard from some of the healthcare providers in the state that the cost doesn’t necessarily correlate with the quality of healthcare. So what is it that the hospitals are doing that is providing good quality healthcare with good outcomes that’s different from the other hospitals? It’s not necessarily just spending a lot of money, hiring more people or having better facilities and so on. There may be some procedures that we need to look at. So how do we tease that out and see what’s going on? That’s something on the top of everyone’s mind right now, right?

(18:00) After someone makes a decision and we need to ground it in truth, what is the sequence of events that happen after that to get a group like yours to actually start working on a project?

JP: Okay, well in Rhode Island and about 22 other states in the Union, the federal government has funded these linked datasets that come from the various agencies in the state. And so we have, in Rhode Island, DataSpark, which now resides at the University of Rhode Island. They were funded by these federal awards in order to procure data sets from the agencies with agreements. Security is a big issue as well when you’re linking data sets and have information about individuals. But this group is funded by the various agencies with the questions that they have.

There’s another thing that we’re trying to move forward to do and that is to, again, securing the data so that individuals aren’t identified or people have permissions to do that, to put some of these data sets or make them accessible or data sets similar to them to scholars and students so that as the agencies have told us, scholars and students sometimes have questions that we never even thought of but we would benefit from having some of these observations that could help us to drive policy.

So we’re working on trying to develop synthetic data sets, which are data sets that look very much like the actual datasets, but will not reveal identities of individuals and allow this sort of interaction. It’s kind of a citizen scientist thing except its scientists and students where you have so much data. Like the astronomers have done this for a long time.

How can you get the data out there for people to look at, to make discoveries that we might not have the workforce to do. Or maybe because we’re looking at it a little differently we can make discoveries that we wouldn’t have made otherwise. So it’s tricky with healthcare data and with agency data because you do have to protect the privacy of the individuals who, for whom the data has been collected.

(20:30) What are the likely end products here from these projects?

JP: So usually in our case, the data is collected. It has to be cleaned. It has to be sampled in reasonable ways so that the analysis tools that use are going to give you the kind of results that you need. In this particular group that we have in the state that it’s linked data sets across the agencies if they want to ask questions in that way.

They have the privileges and ability to look at these data sets and then aggregate the data to answer the questions that are being made. So they present the results in reports and in visualizations and so on based on the questions that have been asked by the agencies. That is what the agencies are able to use then, to then have their discussions about well is there something of concern here? Is this something that would drive policy? Are we going in the right direction with this particular policy? Is the outcome, like with the incarcerated opiate addicts, is this the, this decision of providing treatment post incarceration, is this really working? So can we continue in this direction? Is there something else that we need to be doing? Are the numbers down in the ways that we want them to be?

Usually the question, so far the questions have been originating from the agencies when they have a question. And I am so excited that I have heard, at least through the grapevine, that the legislature in our state has been telling us that they would like to make evidence-based decisions. And they want to have information. ‘

Especially with what’s happening in our state, bringing in companies that are hiring our technology workers and then working very hard at my institution and then training more people in these areas, looking at where the trends are, where we need to train. What we need to do to provide the companies the well educated students in this area of artificial intelligence, data science, cybersecurity, computer science.

Any of these areas. Data is used everywhere. Higher education uses it. Are we doing what we should be doing in the classroom? As I mentioned, healthcare and education. Transportation is another area, and of course, the environment.

Subscribe to our AI in Industry Podcast with your favorite podcast service: