Cross-Sectional Analysis with a Single Wave
Source:vignettes/cross-sectional-analysis.Rmd
cross-sectional-analysis.RmdWhen to use this workflow
A cross-sectional design treats one wave as a population snapshot. This is appropriate when you need a point-in-time estimate (prevalence, association) and do not require within-person change. Common examples include a Master’s thesis using the most recent Health wave, or a methods paper demonstrating a new estimator on a single data release.
Even for a single-wave analysis, lissr adds value: it automates authentication, lets you pick exactly the files you need, and gives you access to the merge engine’s sentinel-code cleaning so you start from analysis-ready data instead of raw SPSS.
Step 1 — download one wave of one module
library(lissr)
# authenticate (once per session)
liss_login()
# build the file inventory
bp <- liss_blueprint()
# filter to the latest Health wave, SPSS format
latest_health <- bp |>
dplyr::filter(
module == "Health",
type == "spss"
) |>
dplyr::filter(wave == max(wave))
latest_health
#> # A tibble: 1 × 8
#> module module_id wave wave_id type name file path
#> <chr> <int> <int> <int> <chr> <chr> <chr> <chr>
#> 1 Health 18 17 1054 spss ch24q 1.0p ch24q_1_0p_EN.sav /down…
# download just that one file
liss_download(latest_health, .dir = "data/ch")Step 2 — read and clean the data
You can read the file directly with haven, or you can
leverage the merge engine to apply the recipe’s sentinel-code recoding
even for a single wave. The recipe knows which numeric codes are “don’t
know” vs “prefer not to say” vs genuine missing — cleaning you would
otherwise do by hand with a codebook open.
library(haven)
library(dplyr)
# option A: read raw and clean yourself
raw <- haven::read_sav("data/ch/ch24q_1_0p_EN.sav")
dim(raw)
#> [1] 4892 271
# option B: use the merge engine for a single wave
# (this applies prefix stripping, sentinel recoding, and labelled policy)
recipe <- liss_recipe("ch")
# temporarily trim the recipe to just the wave you need
recipe$wave_index <- purrr::keep(
recipe$wave_index,
~ .x$id == "ch24q"
)
result <- merge_liss_module(recipe, data_dir = "data/ch", output_dir = "output")
health <- result$data
dim(health)
#> [1] 4892 265The merge engine’s output has cleaner column names (no
ch24q prefix), sentinel values already recoded to
NA, and a wave_id / wave_year
column appended.
Step 3 — attach background variables
Most cross-sectional analyses need demographics: age, sex, education, income. These live in the Background Variables module (CA), not in the survey module itself.
Critical rule: join on nomem_encr only.
Never use nohouse_encr — household assignments change over
time. Match the fieldwork month, not the calendar year.
# download the background variables file for the same fieldwork period
# (health wave 24q was fielded around November 2024)
bg_files <- bp |>
dplyr::filter(
module == "Background Variables",
type == "spss",
wave == 202411 # YYYYMM matching fieldwork period
)
liss_download(bg_files, .dir = "data/avars")
avars <- haven::read_sav("data/avars/avars_202411_EN_1_0p.sav") |>
haven::zap_labels() |>
dplyr::select(
nomem_encr,
age = leeftijd,
sex = geslacht,
edu_level = oplcat,
hh_income = nettohh_f,
urban = sted
)
analysis_df <- health |>
dplyr::left_join(avars, by = "nomem_encr")
nrow(analysis_df)
#> [1] 4892Step 4 — run an analysis
With the merged, cleaned data you can proceed to standard modelling.
For example, estimating the association between education and self-rated
health (suffix s001 in the harmonised output), adjusting
for age and sex:
When not to use this approach
If your research question involves change (did self-rated health
improve after a policy?), or if you want to exploit the panel structure
for causal identification (fixed effects, difference-in-differences),
you should move to the longitudinal workflow described in Longitudinal Panel Analysis
(or run
vignette("longitudinal-panel-analysis", package = "lissr")
in the console). A single wave cannot separate age, period, and
cohort effects, and it provides no within-person variation to control
for time-invariant confounders.