Overview
lissr includes a recipe-driven merge engine that processes YAML specifications conforming to the Canonical Schema v1.0.0. Each recipe encodes all merge-relevant decisions for a module: file patterns, variable harmonization, boundary handling, comparability contracts, and validation checks.
All merged output is written in SPSS .sav format to
preserve variable labels, value labels, and user-defined missing
values.
Single module merge
library(lissr)
recipe <- liss_recipe("ch")
result <- merge_liss_module(
recipe,
data_dir = "liss/ch",
output_dir = "./output"
)This produces four files:
-
ch_merged.sav— merged data (SPSS format, preserving all labels) -
ch_merge_log.jsonl— audit-grade structured log -
ch_merge_summary.json— per-run summary (if enabled in recipe) -
ch_merge_report.txt— human-readable report
Batch merge
modules <- c("ch", "cv", "cd", "cf", "cw", "cp", "cs", "ci")
recipe_paths <- purrr::map_chr(modules, ~ {
system.file("recipes", paste0(.x, "_merge_recipe.yml"), package = "lissr")
})
results <- merge_liss_modules(
recipe_paths,
data_dir = "liss",
output_dir = "./output"
)Cross-module panel merge
After merging individual modules, combine them into one wide dataset.
Columns are prefixed with the module code to avoid collisions
(e.g. ch_s004, cv_s004).
panel <- merge_liss_panel(results, write_to = "./output/liss_panel.sav")
# only respondent-years present in all modules
panel_inner <- merge_liss_panel(results, join_type = "inner")Validate recipes
recipe <- liss_recipe("ch")
validate_recipe(recipe, "ch_merge_recipe.yml")Onboard a new wave
When a new wave is released, the onboarding helper automates most of the checklist: variable diffs, candidate wave_index entry, expected-presence checks, and boundary alerts.
onboard_new_wave(
recipe_path = system.file("recipes", "ch_merge_recipe.yml", package = "lissr"),
new_file = "ch25r_EN_1.0p.sav",
prev_wave_id = "ch24q"
)Background variables
The background variables module (CA) is a monthly snapshot used as the linkage backbone for all other modules. When merging background variables with survey data:
- Use only
nomem_encras the join key (nevernohouse_encr). - Match the fieldwork month of the survey, not the calendar year.
The merged output includes a fieldwork_ym column (YYYYMM
integer) derived automatically from the LISS _m suffix.
Background variable files carry the period in their filename rather than
as a column, so you need to tag each file before stacking:
# example: merge Health survey with background variables
survey <- haven::read_sav("output/ch_merged.sav")
# read avars files and tag each with YYYYMM from the filename
bg_files <- list.files("data/avars/", pattern = "\\.sav$", full.names = TRUE)
bg_data <- purrr::map_dfr(bg_files, function(f) {
ym <- as.integer(stringr::str_extract(basename(f), "\\d{6}"))
haven::read_sav(f) |> dplyr::mutate(fieldwork_ym = ym)
})
merged <- dplyr::left_join(
survey, bg_data,
by = c("nomem_encr", "fieldwork_ym")
)