Visualizations Using Python

Kelsey Lopez

--

In this article, we will discuss some factors that affect insurance charges through graphs created using Python.

for more information, I uploaded my code here.

Pearson’s Correlation Coefficient

.00-.19 “very weak”

.20-.39 “weak”

.40-.59 “moderate”

.60-.79 “strong”

.80–1.0 “very strong”

Correlation Matrix of Numerical Variables

There is a weak positive correlation between age and charges as well as BMI and charges.

Smoking

Smoking on Insurance Charges

The average insurance charges of smokers are above three times compared to those of non-smokers. The two graphs above use the same data and show the same information. However, the pie chart does not provide clear information and may even be misleading. This is because pie charts are normally used for comparing data that are part of a whole. On the other hand, bar charts are great for comparing multiple data of different categories.

Age

Age on Insurance Charges

We can clearly see that average insurance charges constantly increase as we age. Line graphs are often used to show the change of values through time. I created this line graph using the code below:

Sex

Sex on Insurance Charges

There is a slight difference between male and female average charges. However, our next graph may explain why. This bar graph is the same as the one I used to present the difference in average chargers for smokers and non-smokers except the higher value is placed at the bottom.

Smoking Vs Sex

There are clearly more male smokers than females, hence the slight difference in average charges.

Age (Clustered)

Age on Insurance Charges (Clustered)

We can also cluster our data into several groups, which will result to the graph on the right. This is useful if you have too many values that are better presented in clusters.

--

--

No responses yet