4.1 Introduction
When we collect data, it is almost never possible to collect data on the entire population. For instance, if we want to study the habits of people who shop at Checkers, it will not be feasible to send out a survey to everyone in South Africa who has ever shopped at Checkers. When we collect data on a subset of the population, this is called a sample. In cases where we are able to collect data on the whole population, this is called a census. The table below highlights the differences between censuses and samples.
## Warning: package 'knitr' was built under R version 4.3.3
| Census | Sample | |
| Definition | A complete enumeration of every individual in a population. | A subset of individuals selected from a population. |
| Coverage | Includes the entire population. | Includes only a portion of the population. |
| Time | Can be very time-consuming due to large-scale data collection. | Requires less time since data is collected on fewer individuals. |
| Cost | Usually quite expensive. | Less expensive. |
| Accuracy | Accurate if data is collected properly, but errors can still occur. | May have some sampling error*. |
| Feasibility | Difficult for large populations. | More practical, especially if the population is large. |
Sampling error will be explained in a later section.
Although it is generally true that more data is better, there are many reasons to take a sample rather than a census. This includes time and financial constraints, as well as feasibility. For example, when taking a geological survey, it is really not feasible to measure the soil at every location in an area! As long as the sample is unbiased and representative, samples can be very informative and helpful.