Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is determined based on the expense of data collection, and the need to have sufficient statistical power. In complicated studies there may be several different sample sizes involved in the study: for example, in a stratified survey there would be different sample sizes for each stratum. In a census, data are collected on the entire population, hence the sample size is equal to the population size. In experimental design, where a study may be divided into different treatment groups, this may be different sample sizes for each group.
Sample sizes may be chosen in several different ways:
experience – A choice of small sample sizes, though sometimes necessary, can result in wide confidence intervals or risks of errors in statistical hypothesis testing.
using a target variance for an estimate to be derived from the sample eventually obtained, i.e. if a high precision is required (narrow confidence interval) this translates to a low target variance of the estimator.
using a target for the power of a statistical test to be applied once the sample is collected.
using a confidence level, i.e. the larger the required confidence level, the larger the sample size (given a constant precision requirement).
An optimum sample for a study may be defined as that sample which fulfills the requirements of efficiency, representativeness, reliability and flexibility. That is, the sample should be small enough to forestall unnecessary expense and large enough to help the researcher avoid sample-error beyond the limit to tolerance.
It should be large enough to yield statistically representative and significant results in all tabulations of any import but it need not be so large as to result in wastage of funds, retarding the project and achieving needlessly high precision. The sample should yield the desired estimates with the required level of reliability at a minimum cost.