The challenge: Create a tool predicting where crime will happen

Picture a police precinct in a large U.S. city. Taped to the wall is a map covered with pushpins—red for burglaries, blue for auto theft, and so on—each indicating the spot of a previous crime. Detectives stare at the map, seeking out a pattern to determine problem areas and, maybe, where these offenses might strike next.

Now imagine the data represented by the pushpins computerized and inputted into an algorithm whose goal is to forecast future crimes.

That idea, that machine learning can aid in the enforcement of the law, inspired a competition held by the National Institute of Justice this summer. Using five years of data from the city of Portland, Ore., groups were tasked with creating a tool to predict where 9-1-1 calls for burglary, car theft and street crime would occur for periods ranging from one week to three months out. A team led by Penn criminologist Charles Loeffler tied for first place in the Large Business Division.  

​​​​​​​​​​​​​​​​​​​​​“We’ve been doing a version of these calculations for decades,” Loeffler says. “The introduction of computing has made it possible to do it with higher accuracy and higher spatial and temporal resolution. But it’s not fundamentally different.”

Loeffler’s “Team Kernel Glitches” included Pau Pereira and Michael Chirico, both May graduates of Penn’s economics doctoral program, as well as Seth Flaxman, then a postdoc at Oxford University and now teaching at Imperial College London. The researchers opted for a flexible algorithm that relied only on calls for service to the Portland Police Bureau, rather than incorporating additional fields like social media or weather data.

“We used a set of statistical tools that are very good for forecasting, the same methods and theory behind what movies Netflix recommends to you or how cars are learning to drive,” says Pereira, now a research economist at Amazon. “It was a purely statistical model. We did not have any theory about what causes crime; we just looked at the data and extracted the strongest signal.”

The method learns from historical events to make predictions about forthcoming events, he says. In other words, “whatever happened in the past is the best predictor of what will happen in the near future. If you’re trying to predict an event in space, what happened nearby is the best predictor for what will happen there. It would probably do well predicting bee stings.”

“Or where and when there’s going to be a rash of tick bites, assuming you had good historical data on these types of events,” Chirico says.

This approach was particularly successful at what Loeffler describes as likely the most difficult part of the challenge: forecasting infrequent crimes at short intervals. To explain, he describes metaphorically dividing the Oregon city into a grid of cells, each one measuring 250 feet per side.

“Let’s say in the span of one week, there might only be a dozen reported burglaries in a city like Portland. Despite being few and far between, you have to figure out how to find the right cells,” he says. “Finding the place with the most burglaries during a 10-year period would be much easier. But predicting the next week, there’s a lot more noise.”

Law enforcement departments in some cities already employ these algorithms alongside crime analysts who make computerized versions of those older pushpin maps. Even so, opponents of this technology often question its usefulness and accuracy, as well as its transparency and fairness.  

“Does it perpetuate existing inequalities in policing? Which data do you use when you do this?” Loeffler says. “Others say it’s not about which data you use but about how you respond. Is the response to the prediction more policing? More social services? Are you going to use these to affect how you deploy police in only poor neighborhoods or all neighborhoods? There are a lot of important public-policy questions that this technology brings to the surface.”

Versions of these questions will likely arise in a range of domains as machine learning becomes more pervasive.

“Your fridge, your car, everything will do machine learning,” Chirico says. “These ways of predicting what will happen and trying to make them happen more smoothly will become more ubiquitous.”

Chirico himself has expanded the algorithm the team created to predict fires and calls for medical help in Seattle using logs from the city’s 9-1-1 calls.

“As long as you have data to train on from the past or recent past,” Loeffler says, “you can use these methods to make educated predictions.”

Portland Oregon