CCSG

Appendix C (Additional Information on Different Sampling Techniques and Terminologies)

Simple random sampling (SRS)

SRS uses a sampling frame numbered 1 to [latex]N[/latex] (the total number of elements).
Formula for estimating the sampling variance of a simple random sample:
- - var [latex]y=\frac{\left ( 1-f \right )s^{2}}{n}[/latex], where

- - [latex]f[/latex] is the finite population correction and is equal to [latex]n[/latex] (the sample size) divided by [latex]N[/latex] (the number of elements on the sampling frame)
  - and [latex]s^{2}[/latex] is the sample element variance of the statistic of interest: [latex]s^{2}=\frac{\sum\limits_{i=1}^x \left (y_{i}-\bar{y} \right )^{2}}{n-1}[/latex].

The finite population correction indicates that, unlike the assumption made in standard statistical theory that the population is infinite, the survey population is finite in size and the sample is selected without replacement [zotpressInText item="{2265844:DKSC4DFC}"].

Systematic sampling

Steps of systematic sampling:
- Compute the selection interval ([latex]k[/latex]) as the ratio of the population size, [latex]N[/latex], to the sample size, [latex]n[/latex]. In the formula, [latex]k=\frac{N}{n}[/latex].
- Choose a random number from 1 to [latex]k[/latex].
- Select the element of that random number from the frame and every [latex]k[/latex]th element thereafter.
Example 1:
- Imagine the size of the sampling frame is 10,000 and the sample size is 1,000, making the sampling interval, [latex]k[/latex], [latex]\frac{10000}{1000}=10[/latex]. The sampler then selects a random number between 1 and 10–for instance, 6. The sampler will then make selections in this order: 6, 16, 26, 36…9996.
- Additional steps if the selection interval is a fraction:
  - Compute the selection numbers by adding the fractional sampling interval each time.
  - Drop the decimal portion of the selection numbers.
Example 2:
- The size of the sampling frame is 10,400 and the sample size is 1,000, making the sampling interval, [latex]k[/latex], [latex]\frac{10400}{1000}=10.4[/latex]. The sampler selects a random number between 1 and 10.4, for instance, 6. The selection numbers would then be: 6, 16.4, 26.8, 37.2…10395.6. After rounding down, the selection numbers become: 6, 16, 26, 37…10395.

Stratified sampling

Stratified sampling steps:
- Find information for every element on the frame that can be used to partition the elements into strata. Use information that is correlated to the measure(s) of interest. Each element on the frame can be placed in one and only one group.
- Sort the frame by strata.
- Compute a sample size (see Guideline 5).
- Determine the number of sample selections in each respective stratum (allocation).
There are 3 main types of allocation:
- Proportionate:
  - Selecting the sample so that elements within each stratum have the same probabilities of selection. Another way to conceive of proportionate allocations is that the sampler selects a sample of size [latex]n_{n}[/latex] from each stratum [latex]h[/latex] such that the proportion of elements in the sample from stratum [latex]h[/latex],[latex]\frac{n_{h}}{n}[/latex], is the same as the proportion of elements on the frame from stratum, [latex]\frac{N_{h}}{N}[/latex].
- Equal allocation:
  - An allocation where the same number of elements are selected from each stratum.
  - If one knows that all strata have equal distributions of the statistic of interest on the sampling frame, an equal allocation will create the highest level of precision in the sample estimate.
- Optimal:
  - An allocation that produces the highest precision (i.e., narrowest confidence intervals) for the sample mean of any statistic of interest.
  - The sampler needs accurate estimates of the distributions of the frame elements for each stratum on the statistic of interest.

Cluster sampling

Within-cluster homogeneity:
- When selecting people, it is important to consider that people within a cluster tend to be more similar than people across clusters because of:
  - Self-selection.
  - Interaction with one another.
  - Since elements within a cluster tend to be alike, we receive less new information about the population when we select another element from that cluster rather than from another cluster. This lack of new information makes a cluster sample less precise than a stratified or even simple random sample. The rate of homogeneity ([latex]roh[/latex]) is a way to measure this clustering effect.

Design effect

A survey’s design effect is defined as the ratio of the sampling variance under the complex design to the sampling variance computed as if a simple random sample of the same sample size had been selected. The purpose of the design effect is to evaluate the impact of the complex survey design on sampling variance measured to the variance of simple random sampling as the benchmark.
For a cluster sample, the design effect is the effect of having chosen sampled clusters instead of elements. Due to within-cluster homogeneity, a clustered sample cannot assure representation of specified population subgroups as well as SRS, and will tend to have a design effect greater than one. On the other hand, stratification tends to generate design effects less than one, since it ensures that specified population groups will be allocated at least one sample selection.
- In general, clustering increases the design effect, while stratification decreases it.
- Formulas:

Stratified designs:

[latex]d_{eff}=\frac{var\left ( \bar{y}_{complex} \right )}{var\left ( \bar{y}_{SRS} \right )}[/latex]

where [latex]d_{eff}[/latex] is the design effect;

[latex]var\left ( \bar{y}_{complex} \right )[/latex] is the variance of the complex sample design, whether it be stratified only, clustered only, or a stratified cluster design; and

[latex]var\left ( \bar{y}_{SRS} \right )[/latex] is the variance of an SRS design, with the same sample size.

[latex]d_{eff}=1+\left ( b-1 \right ) roh[/latex]

where [latex]d_{eff}[/latex] is the design effect;

[latex]b[/latex] is the number of subselections within a selected cluster; and

[latex]roh[/latex] is the rate of homogeniety.

In order to estimate the design effect for a new study, the [latex]roh[/latex] is calculated from an earlier survey on a similar topic within a similar target population.
Subsampling within selected clusters (multi-stage sampling):
- [latex]n=a\times b[/latex], where [latex]n[/latex] is the sample size, [latex]a[/latex] is the number of clusters selected, and [latex]b[/latex] is the number of selections within each cluster.
- Pros: reduces the design effect and makes estimates more precise.
- Cons: increases total costs because need to send interviewers to more areas.

Probabilities proportional to size (PPS)

Situations where clusters are all of equal size rarely occur. PPS can control the sample size while ensuring that each element on the sampling frame has an equal chance of selection.
Probabilities at either the first or second stage can be changed to ensure equal probabilities of selection for all elements.
Imagine a two-stage cluster design where the clusters were blocks and the elements were housing units (HUs). The PPS formula would be [latex]f=f_{block}\times f_{hu}=\frac{\alpha B_{\alpha}}{\sum B_{\alpha}}\times \frac{b}{B_{\alpha}}[/latex], where:
- [latex]f[/latex] is the overall probability of selection of the element,
- [latex]f_{block}[/latex] is the probability of selection of the cluster,
- [latex]f_{hu}[/latex] is the probability of selection of the element within the cluster,
- [latex]\alpha[/latex] is the number of cluster sections,
- [latex]B_{\alpha}[/latex] is the number of elements within the selected sections [latex]\alpha[/latex] on the frame,
- [latex]\sum B_{\alpha}[/latex] is the number of elements on the frame, and
- [latex]b[/latex] is the number of elements selected within the cluster.

Example:

Block	#Housing Units in Block	Cumulative Housing Units
1	25	25
2	30	55
3	35	90
4	40	130
5	20	150

The sampler has the above list of blocks and wants to select three blocks ([latex]\alpha[/latex]), keep the sample size constant at 15 HUs, and ensure that each HU has an equal probability of selection of one in ten ([latex]f=15/150[/latex]). Using cumulative totals, numbers can be assigned to each block. Block 1 is assigned numbers 1-25, Block 2 26-55, Block 3 56-90, Block 4 91-130, and Block 5 131-150. From here, systematic sampling can be used to obtain a simple, without replacement sample of blocks based on the HUs within each block. Based on the frame size of 150 ([latex]\sum B_{\alpha}[/latex]) and the number of selections being three, the selection interval is 50. Suppose the sampler chooses a random start of 29. In this case, the selection numbers would be 29, 79, and 129, corresponding to selections of Block 2, Block 3, and Block 4. To determine the selection probability of the HUs within Block 2 ([latex]f_{hu}[/latex]), use the formula:

[latex]f=f_{block2}\times f_{hu}[/latex]

[latex]\frac{1}{10}=3 \left ( \frac{30}{150} \right )\times f_{hu}[/latex]

[latex]f_{hu}=\frac{1}{10}\times \frac{150}{90}=\frac{1}{6}[/latex]

Since the selection probability of HUs within Block 2 is 1/6, the number of HUs selected within Block 2 ([latex]b[/latex]) will be [latex]30\times \frac{1}{6}[/latex] or 5. Going through the same calculations for Blocks 3 and 4 will show that each block will have five selections.
Potential problems and solutions with PPS sampling:
- Problem: the same cluster may be chosen more than once.
  - Solution: use systematic selection with PPS [zotpressInText item="{2265844:AT3ESSMK}"].
Problem: some of the clusters may not be large enough to produce subsamples of the required size.
- Solution: link clusters to create new clusters that are all of sufficient size.
Problem: some of the clusters are too large and the probability of selecting the cluster is greater than one.
- Solution: remove the cluster from the list and choose elements from it directly.

Two-phase sampling

Suggested steps [zotpressInText item="{2265844:BLZRKBZS}"]:
- Phase 1: conduct a survey on a probability sample, using a relatively inexpensive data collection method subject to higher nonresponse rates than more expensive methods (see Data Collection).
- Once the survey is completed, select a probability subsample of the nonrespondents to the Phase 1 survey.
- Phase 2: use a more expensive method that generally produces lower nonresponse on the subsample.
- Combine the results of the two surveys, with appropriate selection weights to account for unequal probabilities of selection between the selected respondents.

Panel designs

Three concerns about panel designs:
- The effort and costs of tracking and locating respondents who move over the duration of the panel survey.
- The change in the elements on the sampling frame over time. For example, in a cross-cultural panel survey of persons age 65 and older, some members of the original sampling frame will die, while other people will become eligible for selection.
- The repeated questioning of the same subjects over time may change how the subjects act and answer the questions (i.e., panel conditioning effect).

References [zotpressInTextBib style="apa" sortby="author"]