### Preface:

Few essential steps are needed to be followed especially if the data set numeric and the problem associated with it is of regression type. One of them is a five-point summary. It comes from descriptive statistics. The five-point summary says how the data look like, how the data is distributed among the quartiles. The near-identical counterpart of the five-point summary is boxplot. We can generate a boxplot for each feature. So we get an overall picture for a feature. It becomes handy for a quick overview of the data set. Python’s matplotlib and seaborn both can generate a box-whiskers plot. R has an inbuilt function for the same.

The figure below is an example of boxplot of a normally distributed data:

### What we can get from the boxplot?

It shows the distribution of data(feature/column) with the help of a box and whisker(a perpendicular line at both the end of the box).

- We can say it is another form of distribution graph.
- Boxplot provides a five-point summary which helps users to get critical info in less time.
- The five-point summary includes a minimum value, lower quartile (Q
_{1}), median value (Q_{2}), upper quartile (Q_{3}), maximum value. - Whiskers are the perpendicular line drawn from both the sides of the box.
- It can be used to detect outliers.
- Any data point outside of the whiskers is an outlier.
- Length of the whiskers calculated as 1.5* inter-quartile range(IQR).
- Formulation of IQR=Q3-Q1, this is also the length of the box.
- Data points that are red marked are the outliers, as these data points exceed the length of the whiskers.
- If data point more/less than 3*I.Q.R, then it is an outlier.

*Enough of theory, let’s move to code…..*

In the following section, we will see how it’s generated in Python:

**Data set:**

**Sample boxplot:**

**Boxplot with Outlier**

## Leave a Reply