Skip to Main Content

Academic Skills

Descriptive Statistics


Exploring and describing data visually

Used alone, measures of central tendency/spread can sometimes oversimplify the data. A graphical representation can give a deeper insight into the nature of the data. The type of graphs you can use to explore/describe your data are endless and different graphs are used to explore different things. Here we will cover histograms, box plots and probability plots. 

Histograms

Histograms are a great way of exploring several aspects of continuous data. Histograms let us visually inspect characteristics of our data such as The shape or distribution of data (e.g. is it normally distributed, skewed?) The presence of extreme values or outliers Take a look at some example histograms below. This article demonstrates how we can spot all these aspects of our data through a histogram among other uses of histograms in statistics. If your just getting started with statistics you likely don’t need to go into as much depth as the article but it’s important you know the purpose of a histogram and are able to interpret them for significant features e.g. distributions.

Box and Whiskers

  A box plot is also useful at providing the same information as a histogram but in a different way. An advantage of using a box plot is that it gives the exact value of the median. As can be seen from the example below, the range falls between 2 and 14. The line in the middlebox is the median, which is 7. This tells us that half the scores fall below 7 and the other half above 7. The sides that make up the ends of the box also shows  another median. The left side is the median for all the scores below 7 (which is 4) and the right side is the median for all the scores above 7 (which is 10). The box plot is showing us what is known as the interquartile range. It is split up into four quartiles; Q1 ranging from 2-4, Q2 ranging from 4-7, Q3 ranging from 7-10 and Q4 ranging from 10-14. You can also see a single data point at 21. This person falls outside the box plot and could be classed as an outlier, or an extreme value. Box plots can also give an indication of the distribution of data. If the ‘box’ part of the box plot is in the middle, it means most values occur around the centre and likely normally distributed. Similarly, if the ‘box’ is bunched up to the left or right with one tail that is particularly long, this suggests the data is probably skewed. Check out this blog on box plots on how to use and interpret them for a more in depth look. 

Probability Plots

Probability plots (or p-p plots) are a good visualization to check whether your data is normally distributed or not. As you have learned, you can already do this primarily by looking at a histogram or alternatively through a boxplot. Probability plots on the other hand give a more detailed view of how your data is distributed. A probability plot, plots your data against associated z-scores, which you can learn about in “Stats Bites: Distributions”. Using a p-p plot to check for distribution allows you to see more easily exact data points which may be skewing the distribution. Data that is normally distributed will approximately follow a straight line, whilst data that is skewed will present a non-linear pattern. 

Useful Resources

WolframAlpha - Almost any maths problem solved.

KhanAcademy - The "free classroom of the world". Video lectures using a virtual blackboard.

Maths by Subject