Skip to contents

When to use this workflow

A cross-sectional design treats one wave as a population snapshot. This is appropriate when you need a point-in-time estimate (prevalence, association) and do not require within-person change. Common examples include a Master’s thesis using the most recent Health wave, or a methods paper demonstrating a new estimator on a single data release.

Even for a single-wave analysis, lissr adds value: it automates authentication, lets you pick exactly the files you need, and gives you access to the merge engine’s sentinel-code cleaning so you start from analysis-ready data instead of raw SPSS.

Step 1 — download one wave of one module

library(lissr)

# authenticate (once per session)
liss_login()

# build the file inventory
bp <- liss_blueprint()

# filter to the latest Health wave, SPSS format
latest_health <- bp |>
  dplyr::filter(
    module == "Health",
    type   == "spss"
  ) |>
  dplyr::filter(wave == max(wave))

latest_health
#> # A tibble: 1 × 8
#>   module module_id  wave wave_id type  name       file              path
#>   <chr>      <int> <int>   <int> <chr> <chr>      <chr>             <chr>
#> 1 Health        18    17    1054 spss  ch24q 1.0p ch24q_1_0p_EN.sav /down…

# download just that one file
liss_download(latest_health, .dir = "data/ch")

Step 2 — read and clean the data

You can read the file directly with haven, or you can leverage the merge engine to apply the recipe’s sentinel-code recoding even for a single wave. The recipe knows which numeric codes are “don’t know” vs “prefer not to say” vs genuine missing — cleaning you would otherwise do by hand with a codebook open.

library(haven)
library(dplyr)

# option A: read raw and clean yourself
raw <- haven::read_sav("data/ch/ch24q_1_0p_EN.sav")
dim(raw)
#> [1] 4892  271

# option B: use the merge engine for a single wave
# (this applies prefix stripping, sentinel recoding, and labelled policy)
recipe <- liss_recipe("ch")

# temporarily trim the recipe to just the wave you need
recipe$wave_index <- purrr::keep(
  recipe$wave_index,
  ~ .x$id == "ch24q"
)

result <- merge_liss_module(recipe, data_dir = "data/ch", output_dir = "output")
health <- result$data
dim(health)
#> [1] 4892  265

The merge engine’s output has cleaner column names (no ch24q prefix), sentinel values already recoded to NA, and a wave_id / wave_year column appended.

Step 3 — attach background variables

Most cross-sectional analyses need demographics: age, sex, education, income. These live in the Background Variables module (CA), not in the survey module itself.

Critical rule: join on nomem_encr only. Never use nohouse_encr — household assignments change over time. Match the fieldwork month, not the calendar year.

# download the background variables file for the same fieldwork period
# (health wave 24q was fielded around November 2024)
bg_files <- bp |>
  dplyr::filter(
    module == "Background Variables",
    type   == "spss",
    wave   == 202411  # YYYYMM matching fieldwork period
  )

liss_download(bg_files, .dir = "data/avars")

avars <- haven::read_sav("data/avars/avars_202411_EN_1_0p.sav") |>
  haven::zap_labels() |>
  dplyr::select(
    nomem_encr,
    age       = leeftijd,
    sex       = geslacht,
    edu_level = oplcat,
    hh_income = nettohh_f,
    urban     = sted
  )

analysis_df <- health |>
  dplyr::left_join(avars, by = "nomem_encr")

nrow(analysis_df)
#> [1] 4892

Step 4 — run an analysis

With the merged, cleaned data you can proceed to standard modelling. For example, estimating the association between education and self-rated health (suffix s001 in the harmonised output), adjusting for age and sex:

analysis_df <- analysis_df |>
  dplyr::mutate(
    srh = factor(s001, levels = 1:5,
                 labels = c("poor", "moderate", "good", "very good", "excellent")),
    female = as.integer(sex == 2)
  )

fit <- MASS::polr(srh ~ edu_level + age + female, data = analysis_df)
summary(fit)

When not to use this approach

If your research question involves change (did self-rated health improve after a policy?), or if you want to exploit the panel structure for causal identification (fixed effects, difference-in-differences), you should move to the longitudinal workflow described in Longitudinal Panel Analysis (or run vignette("longitudinal-panel-analysis", package = "lissr") in the console). A single wave cannot separate age, period, and cohort effects, and it provides no within-person variation to control for time-invariant confounders.

Checklist

Before submitting results from a cross-sectional LISS analysis, verify: