Harvard Business School » Social Enterprise » Blog » Machine Learning for Social Impact
Impact Insights

Machine Learning for Social Impact

By: Greg Lipstein 21 Dec 2017
Pump image courtesy of flickr user christophercjensen

Editor’s Note: The below post is part of our Alumni for Impact series, which features alumni who are making a difference in the social sector, specifically in K-12 education, impact investing, nonprofit supportive services and social entrepreneurship. In this post, Greg Lipstein (MBA 2015), co-founder of DrivenData, explains how machine learning can advance social missions. DrivenData is a social enterprise that brings the power of data science to organizations tackling the world’s biggest challenges.

“Finding ways to make big data useful to humanitarian decision makers is one of the great challenges and opportunities of the network age.” - UN Office for the Coordination of Humanitarian Affairs (OCHA)

“The best minds of my generation are thinking about how to make people click ads… That sucks.” - Jeff Hammerbacher, Former Data Manager, Facebook

When social sector organizations think about data, the conversation often begins and ends with measuring impact. This is an important question that needs data, but there are so many more ways to use data to drive impact.

For example:

  • Where can our organization’s resources have the most impact?
  • What are our best opportunities for developing better services or programs?
  • Which of our team’s work processes could be more automated?

Across sectors there is a lot of talk about the promise of machine learning, big data, predictive analytics, and artificial intelligence. What is it about? Why is it happening now? And how might I think about making it useful for the purposes I care about?

This is a quick introduction intended for the curious manager in an impact organization, who is increasingly hearing these terms and is looking for a basic explanation of what’s going on and how to think about making use of it.

The idea: A shift in how computers deliver value for humans

From the dawn of computers, people programmed rules for computers to follow. With machine learning, computers program themselves by learning from data.

What does this mean? Imagine photos of cats and dogs. As humans we are very good at looking at a picture and telling which animal it is.

Machines have traditionally had a hard time with this. It’s difficult for humans to write a computer program that includes all the rules for what makes a cat different than a dog in a picture, even though we know it by sight. Not only do we need to give it the rules (something about ears or noses, maybe), but we also have to tell it how to recognize a nose in an image, in any position. This would be incredibly tedious to write out, and is practically impossible.

But computers are now very good at looking at an image and telling if it’s a cat or dog. How?

If you give a computer enough examples of cat and dog images, labeled with which animal it is, the computer can learn the rules on its own. In other words, the computer will make statistical associations between the properties of cat images that make them cat-like, and the properties of dog images that make them dog-like. This takes a lot of memory and a lot of processing, but we now have enough of both at a low enough cost to make this possible.

This simple process has been used to do things that are changing the way we live.

Putting it into practice: From cats to cancer

Here’s the paradigm we just saw: use examples to work out a set of rules, then use those rules to make inferences when encountering something new. If that sounds familiar, that’s because humans do that all the time. We have all sorts of patterns in our heads for how the world works, often learned from our experiences. We don’t always work out the rules consciously (cats vs dogs), though sometimes we do (alligators vs crocodiles).

Compared with humans, computers have the distinct advantage of doing this at a much larger scale and much more quickly than we can, and the distinct disadvantage of only being able to use information that has been captured and provided in a machine-readable way. As increasingly more data is created and captured, the advantage dominates the disadvantage in many instances that matter for organizations. Let’s consider a few that we encounter in our work.

Use 1: Free up human attention through automation
Case: Lung cancer detection
Cancer-fighting engineers use thousands of examples of CT scans previously labeled by clinical teams to programmatically flag concerning nodules from early screens, prioritize follow-up for those who need it, and streamline reporting for radiologists.

Use 2: Illuminate strategic insights for planning and product design
Case: Financial inclusion
Designers use millions of examples of mobile money transactions in Tanzania to learn how people behave and inform new interventions for fostering trust, promoting access to critical financial tools that have not been available to large portions of the population.

Use 3: Targeting services for greater impact
Case: Restaurant safety
Public agencies use examples of restaurant health inspections and the Yelp reviews left in the weeks leading up to them, in order to predict the incidence and severity of new health risks from recent reviews (and detect 25% more health violations with the same number of inspections).

These are just a few areas where machine learning has the potential to help humans better understand the inner workings of big challenges, and apply these learnings at scale to improve people’s lives.

For more clarity on this topic with lots of real-world examples, check out the full post at DrivenData.