Date: May 3rd 2016
There is lead in Flint’s water. And we know that leads to more questions than answers: Where it is? Which homes are most at risk? When will the lead levels decrease?
We want to shed light on these questions with data. Using diverse sources of information, we use cutting edge-methods in data science and statistics.
The crisis is also one of transparency of information. We’d like to bring the key information to the citizens of Flint as clearly as possible.
What we want to do in this short writeup is give some early results that help to understand the lead level readings that are being continuously collected in Flint. We will continue updating this document as results develop.
For questions about health and getting obtaining lead test kits your home, visit the Michigan.gov website.
The figures and data below are based on several datasets.
Elevated lead readings are occurring throughout the city. They appear to be quite geographically diverse. A location is determined to have elevated lead if the DEQ recorded an amount of 15 parts per (ppb) in a water sample (using EPA standards).
The map below shows all the parcels in the Residential Testing data, displaying low (blue) and elevated (red) lead levels.
Much attention focuses on data with fewer than 700 houses sampled repeatedly (Sentinel Site data). But we are using more than 8,000 unique houses contributing over 15,000 total samples (Residential Testing data). There’s more value in that data than currently. That’s what we will explore here. Which homes are at most risk? Thanks to the wide range of types of properties, geographic areas, and lead levels, we can answer these key questions about what helps predict lead.
The lead readings are known to be highly variable and depend on a number of factors including the way the test was conducted, the time of the day, and the number of hours during which water sat idle in the pipes. The factors that we focus on are the attributes of the property, including the age of construction, condition of the property, when in 2015-16 the sample was taken, and material of service line pipe connecting house plumbing to street pipes.
There seem to be lots of relevant factors. The Property Age seems to be very important.
We observed that one attribute of the parcel that is strongly correlated with lead levels is the year during which the property was built. There is a sharp decline for more buildings built after 1950: for those built in 1950 or before, 10% of readings are above 15 ppb compared to only 6% of the younger properties.
The points in the plot reveal when most of Flint construction occurred (Note the decrease during the Great Depression in 1930s). The line shows the estimated 90th percentile lead level.
The lead service lines play a role, but not as much as you would think. We still see high lead readings even when a property's service lines are made of copper, zinc, and other materials.
We have data for over 8,000 properties, but there are over 50,000 parcels in Flint. Which of the not-yet-tested properties are at risk?
We apply various learning algorithms to the data and predict where we think elevated lead levels might be found. Here are the locations of those properties where our model suggests elevated lead (> 15 parts ber billion) is most likely.