JD Power Dependability Study Boxplots Per Automaker

Wednesday, July 8, 2009

I'm currently searching for a new car, and one of the things that is most important to me is reliability. So I decided to write a little python script to scrape all the statistics from JD Power's website for my own analysis.

While the system is currently being revamped to include data from more sources and persist data in a SQLite database, it will be available here shortly. One of the more interesting things I've done already is create boxplots (also known as box-and-whisker plots) of all the data derived from the Vehicle Dependability Study for each automaker. Data was collected from the "Overall Dependability" score for every car of every year for each automaker and put into matplotlib to make boxplots.

Essentially, these plots show the ranges in dependability scores as well as their general distributions. Each whisker and side of the box is a quartile, with the red line in the middle representing the median. Outliers are marked by plus signs. Higher scores are better.

Because the study is about problems in the first three years of ownership, data includes cars as old as 2001 models and as new as 2006 models. The number of data points for each automaker varies. The data is not guaranteed accurate, though I have no reason to doubt it.

JD Power on the Vehicle Dependability Study:

"Overall Dependability: Taken from the Vehicle Dependability Study (VDS), which looks at owner-reported problems in the first 3 years of new-vehicle ownership, this score is based on problems that have caused a complete breakdown or malfunction of any component, feature, or item (i.e., components that stop working or trim pieces that break or come loose)."
I hope you find the plots as interesting as I did!

Posted by Craig Younkins at 10:30 PM 0 comments