Video Transcript
In this video, we’re talking about graphing experimental data. When an experiment is performed, some amount of data are collected. Often, though, the meaning or the significance of those data points is not clear until they’re graphed. Graphing data can reveal trends and patterns that are otherwise very hard to see. In this lesson, we’ll see how to graph data collected in an experiment and how to make use of it to understand what took place. Before we had data to graph, though, we’ll need to run an experiment to collect it.
Say that we do an experiment on a certain nearby road during evening rush hour between 5:00 and 6:00 pm. If we divide this hour up into 10-minute intervals, then what we have is one, two, three, four, five, six, 10 minute intervals in this hour. And say we make the prediction that of these 10-minute intervals, the most cars will pass by during the last one from 5:50 to 6:00 o’clock. To test our prediction, we’ll stand by this road between 5:00 and 6:00 o’clock one weekday evening. And we’ll count cars as they pass by. Every time a car passes by, we’ll add that to our count for that particular 10-minute interval of time.
And so we do this. We count the number of cars that pass by between 5:00 and 5:10. Then we count the number of cars that passed by between 5:10 and 5:20, then between 5:20 and 5:30, and so on. And say that once we’ve collected all of our data, these are the counts for each interval. For our first time interval, six cars passed by, then 11 in the second time interval, then 14, 20, 25, and, finally, 19. To organise the data we’ve collected, we can make a few columns. First, there’s the column that indicates time Interval. We can call the interval from 5:00 to 5:10 number one, then from 5:10 to 5:20 number two, and so on, up to time interval number six from 5:50 to 6:00 o’clock.
And then corresponding to those intervals is the number of cars that we pass by during each one. So we basically have two columns worth of information. There’s the time interval during which we collected the car count. And then there’s the corresponding count of cars. And by the way, we can already see that our prediction wasn’t correct. We didn’t get the most cars passing by in the last time interval, but rather in the second to last. That said, there’s more to discover about this experimental data. But written in a table form as it is now, it’s hard to see those things. So let’s do this. Let’s create a two-dimensional graph. And we’ll call the horizontal axis, for now, the 𝑥-axis and the vertical axis, the 𝑦.
What we’ll do with this graph is we’ll, plot the data points we’ve collected so we can investigate any trends that are there. So what we’re going to do is plot the two columns of information that we have about this experiment. The first column is the time interval during which we counted the cars, and the second column is the count of cars during those intervals. Now, we know that when we plot data points on an 𝑥𝑦 graph like we have here, we plot them in this order, first 𝑥, then 𝑦. That is, the 𝑥-variable corresponds to the horizontal axis. And the 𝑦-variable corresponds to the vertical. But now, we’re faced with a question, which should be 𝑥 and which should be 𝑦 should the time interval be 𝑥 and the number of cars 𝑦 or vice versa?
To figure this out, it’s helpful to know a little experimental terminology. In any well designed experiment, there’s a variable called the independent variable. This is something we change on purpose in a controlled way in order to see what effect it might have on another variable. With our graph set up the way it is, it’s the 𝑥-axis, the horizontal axis, that represents the values of the independent variable. We mentioned that in an experiment. When the independent variable is changed, that can affect another variable. And there’s a name for that one; it’s called the dependent variable. That variable is typically represented on the vertical or 𝑦-axis.
So then, in our car counting experiment, which variable is independent and which is dependent? Well, we can see that the time intervals that we picked — those particular times and then calling them one, two, three, four, five, and so on — were part of the experimental design. They weren’t influenced or impacted by any other variable. Put another way, we could say that we chose to vary the time in this experiment and see what effect that would have on the number of cars passing by. This tells us that it’s the time interval information, which is the independent variable in our experiment. These particular 10-minute increments were just something we decided on as we planned it out.
Knowing that, we can see what the dependent variable is. The number of cars passing by depended on the particular time interval we chose. So this tells us that when we plot our 𝑥𝑦 points on this graph, the 𝑥-values representing the independent variable will include the time intervals one through six. And the 𝑦-values representing the dependent variable will show the number of cars that passed by in those respective intervals. So as we plot 𝑥𝑦 data points, we can see that we’re plotting the interval with the corresponding number of cars passing by. Now, in order to do this, we’ll need to do a bit of work on our axes. As a first step, we can change over the 𝑥𝑦 labels to read interval and number of cars. Now, the labels of these axes tell us what they indicate.
The next thing we’ll want to do is draw in tick marks on both the horizontal and vertical axes, so we can write in and fit all the numbers on the horizontal axis as well as those that will appear on the vertical axis. Focusing first on the 𝑥-values we’ll need to accommodate values between one and six. So if we put tick marks on that axis that look like this, then starting at zero, we have our first time interval, then our second, then third, all the way up to our last one, the sixth. Then we’ll do something similar for the vertical axis where we indicate the number of cars passing by. We can see that the maximum value we’ll need to indicate is 25. That’s the highest number in any of our intervals.
So what we can do is put tick marks on our vertical axis where each tick represents an additional three cars. So we start out at zero and then at that first tick mark that represents three cars passing by. The next one represents six, then nine, then 12, and so on up the axis until we get to a maximum value of 27. This maximum is greater than the maximum value we’ll need to plot from our data. So that means our axis is able to cover the range of values we have. Now that all this is in place, we can start to plot points from the data we collected using this format, interval and then the corresponding number of cars.
So to plot the point that happens at our first time interval, we’ll draw a vertical line up from that first interval on our horizontal axis. And then looking at our data table, we can see that six cars passed by in that 10-minute interval. What we do then is we find this number on our vertical axis — we find six right here — and then draw a horizontal line over from that until it meets the vertical line from the first interval. And where these two lines meet, we’ll draw in our first data point.
Once that point is in place, we do the same thing for time interval number two. We draw a vertical dash line up from that interval marking. Then we look at our data table and we see that there were 11 cars that passed by during the second interval. We find that value on our vertical axis. 11 cars is right about here, and then we draw a horizontal dashed line over from there. And once again, where these two lines intersect, we draw our data point. Then we move on to our third date point, corresponding to time interval number three. Over these 10 minutes, we counted 14 cars that pass by. So finding that value on our vertical axis, it’s right about there. And once again tracing a horizontal line over from that where the vertical and horizontal lines cross, that’s where our data point goes. And then for data points four, five, and six, the same process is repeated.
And here’s how our graph looks once all the points are populated. Laid out this way, we can see that there’s an overall trend to the data. Generally speaking, the later our time interval is, the more cars were counted to pass by on that road. Now, we say, generally speaking, because we can see there is an exception, at the last time interval, we didn’t count more cars than in any other interval. But overall, the trend of the data set as a whole is for more cars to be passing by the later our time interval is.
There’s a particular name for data that correspond this way. Given an independent variable and a dependent variable like we have in our experiment, if when we increase the independent variable, in this case the value on the horizontal axis, then we see an increase in the dependent variable. Overall, we call that a positive correlation. It means that the trend is for an increase in the independent variable to lead to a corresponding increase in the dependent. And we can see that that’s what we have here in the case of our experimental data.
Of course, it wasn’t necessary that it worked out this way, that by increasing our time interval overall, we increase the number of cars passing by. It could have been that we found the opposite trend going on. In other words, we could have seen that, by increasing our time interval, fewer cars went by. The term for that, when increasing the independent variable leads to a decrease in the dependent variable, is negative correlation. We say these variables are negatively correlated, meaning as one increases, the other decreases. Now just as a side note, these two terms, positive correlation and negative correlation, are helpful. But they don’t describe all the possible relationships that two variables might have. It could be, for example, that there is no correlation between two particular variables in an experiment.
So even though these two descriptions are exhaustive — they don’t cover all possibilities — they are helpful for describing some of the trends we might find. All right, so now we have all the data points we collected, plotted on our graph. And let’s say we take this graph and we show it to a friend. And after studying it for some time, our friend says, I’m curious, how many cars passed by the street in the first 25 minutes of the experiment? In response, we could go back to our data table. And we would see that the first interval covers 5:00 to 5:10 and in the second interval covers 5:10 to 5:20. So that’s the first 20 minutes total. But we didn’t stop at 25 minutes in to count the number of cars that had passed up to that point.
On our graph, 25 minutes would appear here, right between time intervals two and three. What our friend is asking is if we traced a vertical line from this point up, where would it intersect our data? But we can see that it doesn’t intersect the data, at least not directly. We’ve measured points every 10 minutes, but not every five minutes, like we would have needed to to have this answer. At this point, there are a couple of things we could do. One is to redo the experiment and collect data more often. Say we collect it every five minutes instead of every 10 minutes. Or maybe we would even collect it every minute in 60-second time intervals.
We can see, though, that there’s a limit to how much data we can reasonably collect during an experiment. So another option for answering our friend’s question without going back and collecting more data is to develop what’s called a line of best fit for the data we have. Here’s the definition of a line of best fit. It’s a straight line that best exhibits the pattern of the set of data. It’s a line we draw in after the data are plotted to summarise that data. Now, considering our data set of our six points, one way to go about drawing a line of best fit for these six points is to draw in an area that encloses all of the points that we measured in our experiment. And then we’ll draw a line through this area that roughly cuts it in half. So approximately that line could look like this.
Okay, so now that we have our line of best fit, how does that help us? Well, notice that we now can give an answer to our friend’s question of how many cars passed by on the road in the first 25 minutes. Even though we don’t have a measure data point for that value, we do now have our line of best fit. So once we reach that line, we can then move over in a horizontal line and see what value that intersects with on the number of cars axis. Another way that a line of best fit is useful is that its slope or its gradient can tell us just how positively or negatively correlated our data are. The more positive the gradient or slope is, the more positive the correlation. And the more negative it is, the more negative the correlation.
There are a couple of important things to notice about lines of best fit in general and this one in particular. First, we can see that the line doesn’t pass through all of our data points. And it’s important to realise that a line of best fit can be perfectly correct without passing through any of the measured points. Along with this, notice that the line of best fit in this case doesn’t pass through the origin. In general, that’s perfectly fine, unless there’s a physical requirement that the line passed through the origin. It may go through that point, or it may not.
For our line of best fit, we see that it intersects the 𝑦 or the vertical axis right here, at three cars passing by. This is the 𝑦-intercept of this line of best fit. And then if we trace the line further along, we see where it would have intersected the horizontal axis. And that intersection point is called the 𝑥-intercept. The 𝑥- and 𝑦-intercept of the line of best fit are often useful in giving information about the scenario. For example, in our case, the line of best fit’s 𝑦-intercept predicts the number of cars that have passed by at the start of the experiment.
Let’s summarise now what we’ve learned about graphing experimental data. At the outset, we saw that graphing data has the advantage of helping to reveal trends and patterns in experimental data. In two-dimensional cases, we saw that data are plotted in this order. The 𝑥-variable is the independent variable, and the 𝑦-variable is the dependent variable. We saw further that data can be positively or negatively correlated, or neither may be the case. And lastly, we saw that a line of best fit is a straight line that best exhibits the pattern of a set of data.