Boxplot : The most useful graph, visualization of five point summary

Introduction to Boxplot:

When we have a numeral dataset and we do not know from where to start! Boxplot can be the answer. It is like a storyteller for numerical data. It shows five essential statical measures in a single look. For example, it shows the outlier, it shows how the data is distributed per inter-quartile range, it also shows at what distance the outliers are present. Therefore, a boxplot can reveal lots of insight for any given feature.

The figure below is an example of boxplot of a normally distributed data:

What we can infer from the boxplot?

It shows the distribution of data(feature/column) with the help of a box and whisker(a perpendicular line at both the end of the box).

We can say it is another form of distribution graph.
Boxplot provides a five-point summary which helps users to get critical information about the data.
The five-point summary includes a minimum value, lower quartile (Q₁), median value (Q₂), upper quartile (Q₃), maximum value.
Whishkers are the horizontal line that are extended from the Q1 and Q2 respectively.
It can be used to detect outliers.
Any data point outside of the whiskers is an outlier.
Length of the whiskers calculated as 1.5* inter-quartile range(IQR).
Formulation of IQR=Q3-Q1. Therefore, this is also the length of the box.
Data points that are red marked are the outliers, as these data points exceed the length of the whiskers.
If data point more/less than 3*I.Q.R, then it is an outlier.

Enough of theory, let’s move to code…..

The code below generates a box plot using random data. Therefore, each run generates a different a diffenent boxplot.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame( np.random.randn(5, 3),index=["a", "c", "e", "f", "h"], columns=["x1", "x2", "x3"] )

sns.boxplot(x=df["x1"])

Output:

Boxplot with Outlier:

Introduction to Boxplot:

What we can infer from the boxplot?

Leave a Reply Cancel reply