In this explainer, we will learn how to identify outliers from a data set.
Sometimes in a data set there are data points whose values are much bigger or much smaller than the main group of data. Such data points are called “outliers” or “extreme values.” In the graph below, most of the data has values between about 15 and 50. The data point with the value near 100 is an outlier, since its value is substantially larger than the rest of the data points.
An outlier can be a genuine data point; for example, there are people who are much taller than the average height of a human. But an outlier may also be a misrepresentation or an error.
It is important to consider outliers when analyzing a data set, since extreme values can lead us to false conclusions about our data set. For example, suppose you are an airplane seat designer. To design the passenger seats, you need to know the mean (average) height of an adult person.
If you were to use the heights of all the people in the picture above to calculate the mean height, the height of the very tall person would make the overall mean larger than it should be. This would give you a false impression of the mean height. Your seats would then be larger than necessary and your boss would not be happy as there would be less seats, meaning less passengers meaning less profit!
The next two examples show how we might spot a potential outlier from different graphs of data.
Example 1: Spotting a Potential Outlier from a Graph
The table below shows the number of messages exchanged on the smart phones of 14 students over a single month. The data have also been plotted in a dot plot.