Understanding the Margin of Error
Media outlets are describing the Indiana gubernatorial race as a “virtual dead heat” or “effectively a tie” based on results from a recent Bellwether Research Poll. Results show Incumbent Governor Mike Pence’s 4-point lead over Democratic challenger John Gregg, while also reporting a margin of error of 4 points. Many journalists take this to mean the race is a tossup.
There’s a lot wrong with this reading of the poll, so we’ve created a tool to help you understand both the poll results and margin of error. In short, based on this poll, the odds that Pence actually does lead Gregg are nearly 3-to-1. That’s a big takeaway, but seeing it requires digging in and setting a few things straight.
What’s a “margin of error”? Say you polled a random sample of 500 likely voters on their pick in an upcoming election. Overall, these respondents may signal that the Republican has the support of 55% of likely voters. But, another survey using a different random sample of 500 likely voters may show support at 53%. This is because the sample you select affects those estimates – even when it’s chosen completely at random, and even when all other factors of polling design are held constant. The margin of error tells you the maximum expected range you’d expect estimates to fall within if you repeated the survey a bunch of different times with randomly selected samples. If you have a margin of error of 6, for example, that means the vast majority of the time you’d expect different random samples of the same size to give you estimates within 6 points above or below the one from your poll.
We typically talk about margin of error being at the 95% confidence level, even if it’s not explicitly stated. This means that 95% of the time you take a poll with a sample of the same size, your estimates will vary within the margin of error. Only about 5% of the time will you see an estimate outside the margin of error. The closer you get to polling everyone – in other words, increasing your sample size – the less variable your estimates are, and the closer your survey’s maximum margin of error gets to 0. In short, all else equal, you get more accurate polling with larger samples.
There are a few important issues to keep in mind. First, the margin of error that always gets reported tells you the confidence interval for an estimate of exactly 50%. The further away your estimate gets from 50%, the smaller the true margin of error becomes – holding all else constant. And, maybe most importantly, the margin of error that typically gets reported is a measure of accuracy for single estimates in a poll (stuff like the share of people who say the country is headed in the wrong direction); it doesn’t tell you the accuracy of estimated differences in candidate support for head-to-head matchups. That margin of error (the one associated with the differences in estimated candidate support) is quite different from the traditionally reported margin of error; it helps you understand how likely it is that the leader in the poll really leads among likely voters. And, unfortunately, even when they do try to calculate this right type of margin of error, tons of people do it incorrectly.
Also, there’s nothing sacred about a poll’s margin of error at a 95% confidence level; reporting this is mostly a matter of convention. There’s a lot of value in knowing the probability that the leading candidate in a poll really is leading among likely voters, even if that probability is less than 95%. Put differently, a poll may tell you that there’s a 90% chance (rather than a 95% chance) that a candidate actually is leading. Just because you can’t verify the lead with 95% confidence doesn’t mean you should assume the candidates are actually tied. Unwarranted rigidity leads lots of people to naively disregard useful information.
In order to address these problems, we invite guests to use the interactive margin of error calculator we’ve developed. This calculator allows you to set the important parameters: the estimated support for two candidates from a survey, the survey’s sample size, and a confidence level of your choosing. It will calculate the margin of error of the difference in support, the traditional survey margin of error, and tell you whether there’s enough precision to determine that the leading candidate is actually leading at whatever level of confidence you want. And the plot on the left will update dynamically to indicate what sample size you would need to reject the possibility that the candidates are in a “statistical tie” at various levels of confidence.
Let’s take the example above of the supposed “toss-up” between Pence and Gregg in Indiana. Input their estimated support into the calculator (40% and 36%, respectively), along with the poll’s sample size (600), and you’ll see the relevant margin of sampling error for a 95% confidence interval is actually 6.97 points – larger than Pence’s lead of 4 points. This means, at 95% confidence, we cannot reject the possibility that the candidates really are tied in public support. However, if you lower the confidence level you’ll see that Pence’s lead is significant at the 73% confidence level. Put another way, the poll tells us that the odds of Pence actually leading Gregg in support are nearly a 3-to-1. That’s far from the toss-up described in the press.
In a time when polls come and go with each daily news cycle, it’s important to look beyond the margin of error that the press typically reports. Dig a little deeper and you’ll find lots of really valuable information that just about everyone else seems to be missing.