Sunday, December 6, 2015

Steph Curry was Less Remarkable before this Season

Stephen Curry and the Warriors are playing historically good basketball this year. Twenty games in is a bit early to know about best player or team of all time, but this has made me wonder how Curry's career up to this point compares to other recent greats.

The chart below uses data from basketball reference to compare the first seven years of Curry's career to the six MVP winners before him. The graph uses player efficiency rating (PER), an all-in-one metric of player performance. The first five years of his career were about average for an MVP winner, but last year he improved to the level similar to Kevin Durant's sixth year PER, and only noticeably below that of Lebron James. 

As has been widely noted, Curry's PER this year would be far higher than any other player (any player- not just those graphed). I expect his PER to come down a bit throughout the year, as small sample outliers tend to do. But it seems likely that his PER will remain higher than Lebron's seventh season PER.

The graph above is actually a screenshot from a Tableau dashboard. The interactive below has hover-over information, as well as the options to highlight a different player, switch to Win Score per 48, select different players (including Jordan), and change the seasons listed.

The interactive has several interesting stories. For example, Kevin Durant and Lebron James have had very similar careers according to Win Score per 48:

A few closing points: for those un-familiar with PER, it is limited for measuring contributions on defense, and various other off-ball aspects (like screens). But Curry is crushing pretty much every advanced metric this year. The PER and Win Score data in the graphs only include regular season performance.

Also, I haven't done a comprehensive search for similar visuals. If you've seen something similar to this, please let me know in the comments.

Sunday, October 4, 2015

Urban Density Maps

All maps use the same color scale, but not size. Scroll to the bottom for an interactive version with hover-over. The maps were originally built for this post on Atlanta public transit.

Interactive version:

MARTA Expansion: What does the data say?

There have been several recent discussions in Atlanta around major MARTA expansions. News agencies this summer reported MARTA has a not yet funded plan for an $8 billion expansion to Alpharetta, Emory, and East along I-20. A spring poll of Gwinnett voters shows their interest in MARTA, and the City of Atlanta has early plans for a streetcar expansion.

Would these expansions be good for the city? How do we think about this beyond, "I'm a liberal who likes public transit" or, "I'm a conservative who likes low taxes and my awesome SUV"?

Current Usage

For starters, how are residents currently utilizing MARTA? The graph below shows usage per station by cities with heavy rail. Atlanta ridership is not a total disaster like Cleveland, but is well behind the top five. If Atlanta residents aren't utilizing our current stations, we probably don't need more.

Riders per station is calculated by dividing total unlinked trips by total number of stations, and dividing by two to account for one entry and exit per trip.

Cost is another important component to current usage. Is it sustainable? Current MARTA CEO Keith Parker has been praised for his fiscal leadership (and several other accomplishments) but all public transit systems are publicly subsidized. The graph below shows how MARTA compares.

Source: National Transit Database

MARTA rails rides averaged a $2.05 cost to the city in 2014. Atlanta, again, is performing much better than Cleveland, but MARTA is much more expensive to run than leading cities.


Density is also a useful data point for understanding public transit. Guerra and Cervero of UC Berkeley argue that cities should set goals of at least 45 people per gross acre in the half mile surrounding heavy rail stations. (This is equivalent to about 29,000 people per square mile.)

The map below shows that metro Atlanta has no neighborhoods that are even close to this metric. Two census tracts in midtown have densities of about 19,000 per square mile.

Forty-five people per acre is only an estimate. It is also useful to compare Atlanta with other cities that have large heavy rail investments. In the visuals below, Atlanta's density is mapped along with eight other cities that have at least 50 million annual heavy rail passengers. A different color scale (0-100,000) is used from above to accommodate more dense cities. The transition from orange to blue is set at 30,000- to demonstrate Guerra and Cervero's suggested cut-off.

See this post for flat files of the same images.

Atlanta is clearly far less dense than other heavy-rail cities, and is far less dense than experts recommend for heavy rail. There are two possible takes on this: "we need more public transit to get more dense!" or, "additional public transit infrastructure is a bad investment" I tend to fall in the second camp- Atlanta already has a significant public transit investment, yet we're one of the least dense cities in the US. (Related- Atlanta ridership is high relative to our low density.) More public transit investment alone won't change that. 

Instead, metro Atlanta needs to increase density around its current stations. Midtown has an impressive number of current and projected projects, which will increase that neighborhood's density and serve as a draw for riders boarding other stops. MARTA's transit-oriented development is also a good step in that direction, but the plots ranging from two to ten acres owned by MARTA are not large enough to have a major impact on density. (Helpful math fact: there are 640 acres in a square mile.) Increasing density in a meaningful way will require partnerships and investments by local agencies beyond MARTA.

Sunday, September 13, 2015

A 16% Increase in Murders would be Unprecedented

In response to recent news articles that cited a few cities worth of evidence to show a surge in crime rates, fivethirtyeight compiled a fantastic set of 2015 data for 60 cities to ask how the murder rate has actually changed. They found the rate has risen by 16%, but oddly used that to argue that reports have been exaggerated. But 16% would be an unprecedented increase. See the graph below.
The past thirty years have never seen a 16% increase. This would be an increase of almost 700 murders. (The cities used for this graph are slightly different than the fivethirtyeight sample. This graph is limited to the 56 of the 80 largest cities that reported data every year in the sample.)

Of course, count data isn't ideal for data where population changes. The graph below shows a 16% increase next to historical US murder rates.
Fivethirtyeight's estimate was only for large cities, which likely follow different trends than suburban or rural neighborhoods. So 16% is likely high for the projected US change.

The graph shows a few other interesting points. Although a 16% increase is high, that would only take us to the 2008 murder rate; nowhere near the crime wave of the early 90's. 

I was also surprised to see the high murder rates in the 70's and 80's. I was aware of the high rates in the early 90's, but thought they were unique. The US murder rate more than doubled in the twelve years from 1963 to 1974.

Saturday, August 15, 2015

I Speak for the Champion Trees

Answers to the two most common questions about champion trees: 1. Yes, there are actually quite a few people who enjoy measuring big trees. 2. A champion tree is the largest of its species based on this formula:

Champion Tree Points = Height (ft) + Circumference (in) + 1/4 Canopy spread (ft)

The data for this map is curated by the nonprofit American Forests and is a compilation of data from state champion tree lists. I first got interested in champion trees when I came across the Atlanta list; now I appreciate trees more when walking through my local neighborhoods and parks. Some of the national champions on the American Forests list have pictures too- click on a tree in the map to get an image link. Two of my favorite images are the Giant Sequoia and the Western Red Cedar.

Champion Giant Sequoia
Champion Western Red Cedar
I've heard the phrase "in it for the right reasons" more than once when talking to people who are into measuring big trees. I'm still not sure what that means. 

I also noticed from the linked pictures that big tree measuring seems to be a very male-dominated activity.

The graph below gives some additional context around champion trees. California has the most champion points and largest tree, while Florida has the most individual champions. (The map above only shows champions with at least 270 points, but the graph below includes all champions, including species that are quite small.) Eight states don't have any champions- Arkansas, Delaware, Nevada, North Dakota, South Dakota, Rhode Island, Wisconsin, and Wyoming.

To learn how the champion tree map was built in Tableau, see this post.

Champion Tree Map: Dynamic Custom Shapes and Web Scraping

This blog post serves as a "how-to". For a more general post about the champion tree map see here.

This is the closest I've come to a "Pimp my Viz" submission. For a moment I thought two different graphical representations of tree shape was too crowded for one dashboard, but I like them both too much. Here's how I got there.

The Data: American Forests

The nonprofit American Forests has over 700 champion tree pages- one for each tree. I used, a free web extraction tool, to collect the American Forests data. This was a two step process- first use their extractor tool on their tree search page to get a table of all of the champion trees and their URLs. I then trained the extractor tool on individual tree pages and fed it the URL table to get a data set of all champion trees and their dimensions.

Screenshot of training on a tree page.
One drawback of the resulting data set is the location data is somewhat inconsistent. Some locations are a county, some a city, and some a national park. I didn't know an easy way to deal with this, so I looked up the latitude and longitude for the largest 240 trees by hand. This was relatively quick on and took me about 90 minutes.

Dynamic Custom Shapes

I made my first tree map a few months ago using Atlanta data and circles for each tree. I avoided tree shapes the first time because I thought it would look chart-junky. But this time I figured out how to make the tree shape proportional to the actual height, canopy size, and trunk width of the tree, so each image adds unique information.

How can we display all these unique tree shapes?!
Tableau custom shapes are easy to implement, but are designed for dimension variables- a different shape for each discrete value. They can vary in size on one continuous variable, but I wanted my trees to vary in size with respect to three independent dimensions- height, trunk, and canopy. To add a second size dimension, I created a bin on the height to canopy ratio, and assigned different-sized canopy images to each bin. To add a third size dimension, I used a dual axis map and did the same thing for trunks that I did for canopies. By assigning images based on the height to trunk and height to canopy ratios, I was then able to make the overall image size proportional to height, and all three variables are in correct proportion to each other and other trees. (I actually used height^1.6 as the size variable to get the proportions right- because the image size scale is based on area, not height.)

Canopy image for .5x height to canopy ratio.
Canopy image for 3.5x height to canopy ratio.

The two images above are examples. The top was used for trees whose total height is half the width of the canopy, and the bottom was used for trees with a height 3-4 times greater than the canopy width. Note that both PNG images use the same size canvas so Tableau will size them correctly when scaling for height. Both images leave space for the trunk at the bottom too. This keeps the trunk and canopy from overlapping on the dual axis graph. (I used the same image size for the trunk and kept them on the bottom half.)

A sad clear cut and weird green clouds- the workbook before selecting dual axis.
Tree Graph Bar Graph

I like the tree map, but their scattered locations and small size makes it hard to really compare different trees. So I added the "tree graph bar graph" at the bottom. To make this graph, I created a dual-axis bar graph, with one series of graphs for the canopies, and on for the height. I then played with the axes settings to make the trunk stand out below the canopies.

Tree graph before selecting dual axis
Final Touches

Final touches included a highlight action from the graph to the map, a dynamic sort on the bar graph, and background color for both the map and the bar graph. I usually keep backgrounds white to keep focus on the data, but in this case I enjoyed using colors consistent with nature.

Monday, July 13, 2015

Building a Better Crime Map: Learning QGIS for Mapping in Tableau

Hover over the map above to view crime rates by census tract. Use the zoom tools in the upper left to better locate your neighborhood.

I don't usually post how-to Tableau blog posts because there are so many good ones out there, but I learned some new tools for this one and wanted to share.

A friend of mine was recently complaining about local hackathons, "Great, just what we need, another crime map." Although I'm not usually confused by sarcasm, I decided to build a crime map. The Atlanta neighborhood crime maps I found were based on counts instead of rates, which is misleading when neighborhoods have different populations. There were other weaknesses too; some maps don't reveal their methodology, and most only offer a map, without the type of actions to other data that are easy in Tableau.

The Atlanta Police department shares a great crime dataset on their website that includes location coordinates, crime type, time, date, and neighborhood for the past five+ years. The problem is that Atlanta neighborhoods don't have good population data, making a crime rate per person difficult to calculate. So instead of using neighborhoods I used census tracts. To show crime rate per census tract in Tableau, I first had to merge the crime points to census shapes, something I had never done.


I understood that I'd have to use GIS software (which I had never used) to merge points to shapes. I downloaded QGIS and did a couple beginner tutorials. QGIS is a free, open source software, and I found it easy to use, and very powerful.

I then tried a points in polygon analysis with my data. I merged the APD data with Georgia census tracts. I had to use a subset of the crime data because this merge was very slow. The result had the number of crimes per census tract, but I actually needed the census tract for each individual crime so I could take advantage of Tableau's aggregation and filtering features.

A spatial join allowed me to perform the merge I needed. The APD data was my target vector layer and the census data was my join vector layer. This technique was also much faster (not sure why) so the full data set was not a problem.


I then saved the crime data as an excel file and merged in income and population data by census tract, using Excel because sometimes I'm lazy. I used census tract population and income data from the 2013 American Communities Survey 5-year data set, available on American FactFinder.


Although I merged census tract to points, I still needed to load the census shapes in Tableau for the visual. To do this, I used the Tableau Shapefile to Polgon Converter from Alteryx. Theres a few way to do this, but I find the Alteryx solution very easy, as long as the shapefile isn't too huge.


I then loaded the census tract file in Tableau to make a polygon-shaded map and blended in the crime data on census tract geoID to make the color variable. The resulting visual, with some additional graphs as hover actions, is at the top of the post.

Counts vs. Rates

To see the value of switching to counts, see the two maps below using counts on the right and rates on the left. The maps are noticeably different, with the highest crime area moving from downtown to a southeast tract.

Understanding the Data

I also investigated the resulting product based on my curiosity about crime in my city. I eventually realized that the larceny variable had a big impact on the map. Larceny is the most common and mild crime in the dataset- theft without the use of force or trespassing. This includes shoplifting, so the larceny variable was highlighting commercial neighborhoods, which is misleading because my rate only controls for residential population. So dropping the larceny variable made the map much more accurate as a picture of residential crime rate. I would have missed this if I hadn't spent time investigating the resulting visual. See below. Including the larceny offenses really highlights downtown, which I don't think is accurate.

I also added a couple other visuals with the crime data that didn't map census tracts. I put the visual together in this post geared towards a less technical audience.

Sunday, July 12, 2015

Atlanta Crime Map

Crime maps often show only counts, but this map uses census data to show crime as a rate, giving a more accurate picture. 

Hover over the map above to view crime rates by census tract. Use the zoom tools in the upper left to better locate your neighborhood. 

The city has large variations in crime rates. The northern Atlanta neighborhood with the lowest crime rate in the city (between Northside Parkway and Peachtree Road), has a crime rate of 2.8 per 1000 residents, while some neighborhoods have crime rates of over 100 per 1000 residents.

To better understand the map consider the graph below with my own neighborhood highlighted.

In the "2014 Crime Rate by Type" graph, I'm glad to see that my neighborhood's crime rate  is below the city average for each category. Burglary is the most common crime and closest to the city average as a percentage, which seems to mirror the concerns I see on my neighborhood Facebook page. However, that rate is about 10 per 1000 persons (1%), so risk of being burglarized in a given year is not high.

The "Crimes per Year" graph also has promising trends for my neighborhood. Last year (2014) was a relatively low year for crime in my neighborhood, and there has been a large drop since 2009.

These maps include all crime recorded in the public Atlanta Police Department dataset, with the exception of larceny (theft without unlawful entry or threat of force). Larceny was excluded because it includes shoplifting, and resulting maps then show highest crime rates in commercial areas, and differing levels of residential crime are harder to see.

The maps above use census tracts instead of city neighborhoods because census tracts have better population data. The visual below uses neighborhoods to show changes in crime rates and to map the location of each crime. Larceny is included in these maps. Again, my own neighborhood is highlighted, but the user can click on any neighborhood using the graph on the left.

The map shows exact locations of crimes and I'm able to see there were only three robberies (the category that includes muggings) in Kirkwood during the first half of 2015 (January-May 18). I also found it useful to pick a longer time range (slider on top right) and a single crime type (drop down on the top left), then see what times of day that crime occurs. In Kirkwood, robberies tend to happen in the evenings, but burglaries are more likely to happen in the morning.

Update (7/12): I had a couple people  ask about the relationship between income in crime. It's pretty strong; see the graph below. Census tract income data during non-census years is not very precise (I'm using the ACS 2013 five-year file); otherwise the relationship would be even stronger.

For a more information on to build these visuals, see this post.

Tuesday, June 9, 2015

Global Obesity: We're Getting Bigger

The visual above shows global obesity rates and their changes since 1990. The storyboard explains that the US has seen large obesity gains since 1990, but trends have reversed (slightly) in recent years. I suspect that the fitness industry is finally having a larger impact than the junk food industry on our waistlines. This could also explain why the child obesity rate continues to tick up: fitness marketing is rarely aimed at children.

The data also show that almost every country has seen an increase in obesity rates in the past 25 years. Although the US is on the high end, several countries in the middle east and Oceania have higher obesity rates. Use the filters to explore the data on the last slide- switch between adult and childhood obesity, include overweight in the statistics, or find a specific country or region by clicking on the lists.

Source data is re-purposed from this visualization by the talented Ramon Martinez, and is originally from Institute for Health Metrics and Evaluation available here. 

Tuesday, May 12, 2015

Pell Grants, Test Scores, and Graduation Rates

The graph below shows the relationship between SAT scores and college graduation rates for US colleges and universities. As expected, schools with high test scores have high graduation rates. Also, a surprising number of US schools have six-year graduation rates below 50%.

The color of each point represents the percentage of students qualifying for Pell Grants- students from low income families. There is a clear relationship; schools with high test scores and graduation rates are much less likely to admit low income students.

There are notable outliers. For example, there are three light-blue (high poverty) schools with graduation rates above 80%. These schools have both high levels of low-income students and high graduation rates relative to their incoming SAT scores. Even more notable, all three of these schools are in the University of California system (San Diego, Davis, and Irvine). Other UC schools also have high Pell Grant rates relative to their nearby peers.

Further investigation reveals schools in the California State system also over-perform on this graph. UC and Cal State schools appear to be unique environments. Another common factor is relatively high levels of Asian and Hispanic students. I'm interested in learning how much of their high performance is due to effective system policies and how much is due to a local population high in Asian and Hispanic immigrants.

To learn more about these graphs, underlying data, and their interpretation, see this post.

To further explore this data, use the graph below. Notice the graph has filters on the right for state and enrollment, and can display SAT or ACT scores. Hover over the upper left for zoom controls. There is also a fact sheet on each school- hover over any school for a link.

Friday, April 17, 2015

Students Referred to Law Enforcement

This data was originally published as a series of bar graphs by The Center for Public Integrity. The graphs are supported by compelling individual stories, including an autistic sixth-grader charged with felony assault.

Thanks to Ben Wieder of CPI for sharing the data.

Friday, April 10, 2015

Moving from Rankings to Knowledge: Understanding colleges on multiple dimensions

(Updated on 5/11/2015 with new graphs and matching text.)

Choosing a college is a difficult decision. In addition to college visits, college website visits, and experiences of friends and relatives, eager students and parents can look at US News, The Princeton Review, Niche, Forbes, or many others to learn which schools have the best rank.

Rankings are great for a quick impression: Harvard University is #1, Texas A&M University at Corpus Christi is #604. But they aren’t great for giving context. The reader can then drill into each school and piece out the differences, but it’s difficult to mentally compare multiple schools, along multiple indicators.

This is where graphs provide value. See the graph below of Georgia colleges and universities. We can quickly see that Emory, Georgia Tech, and UGA (Georgia) are the top schools in the state in terms of test scores and graduation rates. By graphing an input (test scores) versus an output (graduation rate) we can also see which colleges are over-performing or under-performing. Savannah College of Art and Design (SCAD) has a graduation rate of 68%, which is much higher than other schools with similar test scores, such as Kennesaw State, with a graduation rate of 43%.

However, we shouldn't declare that all schools above the trend line are better than schools below. Consider UGA and Georgia Tech. UGA’s graduation rate is higher than Tech’s even though its average incoming SAT score is over 100 points lower. At first we might assume UGA provides more supports to get students to graduation. But as an Atlanta resident with GA Tech co-workers I often hear that Tech is very rigorous. Rigor could lower their graduation rate, but improve outcomes for the students who do graduate. This is backed by evidence. In 2011, 84% of Tech freshmen reported studying at least 11 hours per week, compared to 70% of UGA freshmen. reports that the average Georgia Tech grad earns a starting salary of $60,700, while UGA grads earn $43,800.

Schools are color-coded by the percentage of students attending who receive Pell Grants. Schools in the upper right, with high test scores and graduation rates, admit far less low-income students. However, after controlling for test scores, there is little relationship between a school's poverty level and the trend line in Georgia. SCAD has a lower Pell Grant rate than Kennesaw State, consistent with its higher graduation rate, but Georgia State has a higher Pell Grant rate than either school and is slightly above the trend line.

To learn more about schools relevant to you, use the graph below. Notice the graph has filters on the right for state, and enrollment, and can display SAT or ACT scores. Also, hover over a school for a link to drill-down information, or hover over the top left for zoom tools.

Source data is from IPEDS, a series of school-level surveys conducted annually by the U.S. Department of Education’s National Center for Education Statistics (NCES). Data used are the most recent available (as of March 2015) and reflect the 2012-2013 and 2013-2014 school years, depending on the data point. Graduation and test score data is from the 2012-2013 school year. More detail here. Source data available here. Schools either not submitting IPEDS data or not classified as "Degree-granting, primarily baccalaureate or above" are not included.

IPEDS graduation rates do not account for students who graduate after transferring. This report shows the national graduation rate rises by nine percentage points after accounting for graduation after transfer.

Wednesday, March 18, 2015

Price Changes in Consumer Products: Tech vs. Health Care and Education

The graphs were inspired by this tweet from Chris Dixon:

I made the graphs in Tableau using his linked table, and normalized all results by the overall price index to get real prices.

Detractors might argue that the health care price index doesn't fully reflect quality improvements. But high US health care costs are hard to dispute.

There's other great narratives beyond those highlighted above. Notice musical instruments were eight times more expensive in 1929. That fact, combined with cheaper personal electronics, seems to nicely explain popular music in the second half of the 20th century.

Tuesday, March 10, 2015

Mapping Atlanta's Champion Trees

Atlanta's largest tree, center foreground.
The largest tree in Atlanta sits a block from Turner Field at Our Lady of Perpetual Help. The great cherrybark oak is 23 feet in circumference and over 100 feet tall, and shares its space with terminal cancer patients cared for by Catholic nuns. The staff will gladly show visitors their prized tree, old, gnarled, and massive.

Over the past few months, I visited several of Atlanta’s largest trees. This gave me the privilege of viewing incredible trees and tricked me into exploring new parts of the city. The tallest tree in the city is as tall as a sixteen story building, and many of these trees were around before the civil war. While tall, healthy trees may not be the first thing we associate with cities, urban trees are often larger than those in forests. Urban trees are few but can grow both wide and tall while forests trees compete with each other for resources, resulting in tall trees with relatively narrow trunks and canopies. Further, most forests are either periodically logged or only recently protected (relative to the age of a great tree) while some trees in urban parks and on private residences have been able to grow to great size and age due to their protected locations.

The map below shows the champion trees of Atlanta. Champion tree points are a formula found by summing the trunk circumference in inches, the height in feet, and one-quarter of the average crown spread (canopy) in feet. The Atlanta champion tree list contains the local champions of over 100 different species and their close runners up and is maintained by Eli Dickerson, a volunteer with Trees Atlanta. Source data for the list is supplied by Eli and other tree enthusiasts who find and measure large trees across the city.

The map shows that many of Atlanta’s largest trees reside in old Intown neighborhoods on the near East side of Atlanta and in the corridor stretching from Piedmont Park to Atlanta Memorial Park.

Use the zoom controls in the upper left of the map to find your neighborhood, or use the bar graph on the left to click on large trees and show their location and an image for that species.

Let me know in the comments if you visit the champion trees in real life!