Severity of a car accident

Car accidents happen each year. Some people died, while others survived. We are interested to know why people get such different results so we are going to examine the factors influencing severity of the car accidents.

First, I will import the libraries first becuase it is very handy to use anytime.

After reading the dataset, I have obesrved many non-values in this dataframe. Thus, I want to clean the dataset to obtain a better result for analysis.

Based on the data above, I drop variables with too many missing values such as PEDROWNOTGRNT; meanwhile, I also delete columns that won’t directly influence the severity of a car accident.

Now, I have a clean dataset! However, I do realize that too many categorical values appear in the table. They will stop me from further analysis, and I therefore have to decode them.



Before diving into decoding, I have to make sure na isn’t in the table so I can do it well.

As you can see now, we have a beautiful table below. Let’s use heatmap to see the influential variables for severity of a car accident.


It turns out that there are no significant relationship between those variables and severity of a car accident.

You must be very confused when reading my results but, in fact, I was confused, too. I think three possible reasons lead to this situations.

There is no notable relationship between severitycode and other variables. Nonetheless, I have observed an interesting phenomenon through graphs. For instance, intersections are the most common accident areas. As for weather, overcast causes the most accidents. Also, many accidents take place while the road are wet.

Project Summary

