## Introduction to Boxplot:

When we have a numeral dataset and we do not know from where to start! Boxplot can be the answer. It is like a storyteller for numerical data. It shows five essential statical measures in a single look. For example, it shows the outlier, it shows how the data is distributed per inter-quartile range, it also shows at what distance the outliers are present. Therefore, a boxplot can reveal lots of insight for any given feature.

The figure below is an example of boxplot of a normally distributed data:

## What we can infer from the boxplot?

It shows the distribution of data(feature/column) with the help of a box and whisker(a perpendicular line at both the end of the box).

- We can say it is another form of distribution graph.
- Boxplot provides a five-point summary which helps users to get critical information about the data.

- The five-point summary includes a minimum value, lower quartile (Q
_{1}), median value (Q_{2}), upper quartile (Q_{3}), maximum value. - Whishkers are the horizontal line that are extended from the Q1 and Q2 respectively.
- It can be used to detect outliers.
- Any data point outside of the whiskers is an outlier.
- Length of the whiskers calculated as 1.5* inter-quartile range(IQR).
- Formulation of IQR=Q3-Q1. Therefore, this is also the length of the box.
- Data points that are red marked are the outliers, as these data points exceed the length of the whiskers.
- If data point more/less than 3*I.Q.R, then it is an outlier.

*Enough of theory, let’s move to code…..*

The code below generates a box plot using random data. Therefore, each run generates a different a diffenent boxplot.

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df = pd.DataFrame( np.random.randn(5, 3),index=["a", "c", "e", "f", "h"], columns=["x1", "x2", "x3"] ) sns.boxplot(x=df["x1"])

**Output:
**

**Boxplot with Outlier:**