# Basic Guide to Sampling for Disability Surveys

A question often asked is what sample size is needed to generate a large enough sample of people with disabilities for quantitative analysis. The answer depends on a number of factors.

First, are you interested in measuring the prevalence[1] of disability in the population, or are you interested in examining the correlation between disability and various outcomes? The former requires significantly smaller samples than the latter but few data collections are only done to estimate prevalences. There generally is interest in providing information on the characteristics of people with disabilities . Some basic issues in determining sample size are discussed below but it might not be straightforward to apply this general guidance to a specific situation particular for multipurpose data collections. It is usually best to consult a sampling statistician.

__How to compute the sample size needed for disability prevalence__.

To compute the sample size necessary to estimate disability prevalence you first need some basic information or you need to make assumptions if that information is not known. You must have a rough idea of the standard deviation of the prevalence estimate for the population of interest. If your country has had reasonable disability questions on a census our household survey you can use the standard deviation of those estimates. However, many surveys and censuses have questions that lead to low prevalence estimates because of poor questions. Rather than use the standard deviation of those estimates, one option is to use an estimate of .5 for the standard deviation.

Second, you need to choose how accurate you want your estimate to be. That is, what margin of error you are willing to have.

Third, you need to determine how much confidence you want to have that the true prevalence rate in the population falls within your margin of error. Reducing the margin of error and increasing the level of confidence can significantly increase the size of the sample, so you should pick these parameters to align with your purpose for making the estimate. How important is it that your estimate is within a certain range? If you are planning for budgetary purposes for service provision where precise information is needed you might want a smaller range than if you are simply trying to get a general sense of the situation. However, many data collections are for multiple purposes. It is important to determine how these purposes might affect sample size.

For a simple random sample, the calculations are not difficult. An example is shown at the end of this blog, where making some standard assumptions and choices, we find that a sample of 1068 people would be sufficient for estimating overall disability prevalence with a margin of error of 3 percentage points at the 95% confidence level. However, many samples are not simple random samples which requires sample size determinations that address the specific characteristic of the sample design. These are not addressed here.

If you wanted to estimate the prevalence separately for men and for women, with the same degree of confidence and margin of error, you would have to double that sample. Similarly, if you wanted to have separate prevalence estimates for every region within a country with the same confidence and margin of error, you would need to multiply the target sample size by the number of regions for which an estimate is needed.

__Sample sizes for analyzing the relation of disability to various outcomes__

Typically, doing this sort of analysis requires larger samples than simply estimating disability prevalence. And the more disaggregation you want to do by other characteristics, the bigger your sample size needs to be. If you want to compare, for example, rural women with disabilities to urban women with disabilities in terms of their levels of education or poverty status, you must have a reasonable number of each type of woman.

The needed sample size grows as:

- You are interested in more intersectionalities, such as the influence of gender, age, ethnicity, region of residence, or other factors with disability status.
- The more you want to look at the differences between people with different types of disability or degrees of disability.

A reasonable rule of thumb is that if you have at least 500 people with disabilities in your sample (or within each subgroup of people with disabilities) then you can do meaningful disaggregation and multivariate analysis. So, if your prevalence estimate is 10%, then you would need a random sample size of about 5000 people. Notice this is much bigger than the 1068 observations you would need to simply estimate disability prevalence in the total population in our example.

Finally, as many surveys use the household as the sampling unit, less than 5000 households would need to be selected to get 5000 people. This fact is discussed in the more technical example below.

__Examples of__ __how to compute the sample size needed for disability prevalence__.

Let’s choose a margin of error of +/- 3% so that we expect the true value of prevalence to fall within 3 percentage points from our estimate. Then let’s choose the standard confidence level of 95%. That is, you want to be 95% sure that the true value of what you are estimating falls within the margin of error you have chosen.

Next, you need an estimate of the standard deviation you expect in survey responses. If you have no idea what the standard deviation is, then use an estimate of .5 to improve the likelihood your sample will be large enough. Another possibility is to look at the standard deviation of disability prevalence from other surveys in the same or similar countries. In a major US survey, for example, the standard deviation was .3. In our example, we will use .5.

You then need the z-score that corresponds to the desired confidence level. The z-score for a 95% confidence interval is 1.96. (If you want to use a different confidence level you need to look up the z-score from a table that will be in the back of any introductory statistics text.)

Simply plug in your z-score (z), standard deviation (s), and margin of error (m) into the following equation.

Sample Size = z^{ 2 }s(1-s)/m^{2}

With a 95% confidence level, a standard deviation of .5, and a margin of error of +/- 3%, you need a sample size of 1068 since:

1.96^{ 2 }(.5(1-.5)/.03^{2}) = (3.8416)(.25)/.0009 = 1067.11

Here is a table showing the sample sizes for different combinations of standard deviations (the columns) and margins of error (the rows), all for a confidence level of 95%. For example, for a standard deviation of .3 and a margin of error of +/-3 you would need a sample of 896. You can see how quickly the sample size grows as a smaller margin of error is desired.

0.3 | 0.5 | 0.8 | |

+/-2 | 2017 | 2401 | 1537 |

+/-3 | 896 | 1067 | 683 |

+/-5 | 323 | 384 | 246 |

But keep in mind this is the number of persons needed. In the first example would need fewer than 1068 households. Unfortunately, we cannot just say that, for example, if there were 4 people per household you would need 1068/4=267 households because that would not be a pure random sample. It would be a clustered sample, and we would expect the characteristics of people within the same household to be correlated. So, we would have to adjust the formula for the sample size to take into account the nature of the sample design. That is too technical for this blog. When in doubt, consult a statistician, but this blog can give you an idea of the scale of sample sizes needed.

[1] Prevalence is the proportion of condition affecting a particular population