4.1 Introduction

When we collect data, it is almost never possible to collect data on the entire population. For instance, if we want to study the habits of people who shop at Checkers, it will not be feasible to send out a survey to everyone in South Africa who has ever shopped at Checkers. When we collect data on a subset of the population, this is called a sample. In cases where we are able to collect data on the whole population, this is called a census. The table below highlights the differences between censuses and samples.

## Warning: package 'knitr' was built under R version 4.3.3
Table 4.1: Census vs. sample
Census Sample
Definition A complete enumeration of every individual in a population. A subset of individuals selected from a population.
Coverage Includes the entire population. Includes only a portion of the population.
Time Can be very time-consuming due to large-scale data collection. Requires less time since data is collected on fewer individuals.
Cost Usually quite expensive. Less expensive.
Accuracy Accurate if data is collected properly, but errors can still occur. May have some sampling error*.
Feasibility Difficult for large populations. More practical, especially if the population is large.

Sampling error will be explained in a later section.

Although it is generally true that more data is better, there are many reasons to take a sample rather than a census. This includes time and financial constraints, as well as feasibility. For example, when taking a geological survey, it is really not feasible to measure the soil at every location in an area! As long as the sample is unbiased and representative, samples can be very informative and helpful.