This quick-start guide provides basic instructions for using a Bayesian hierarchical modelling framework to estimate antibody titers while correcting for batch effects and other sources of experimental variation.
We use simulated dilution series data, mirroring the structure of data from respiratory syncytial virus (RSV) foci reduction neutralization tests (FRNTs), as an application example. Four levels of working virus dilution (an assay-level covariate) are simulated, with seven batches per level and 20 samples per batch, yielding a total of 28 batches and 560 serum samples. Each batch includes a virus control (VC), a positive control (PC), and an International Standard control (IS), which is assumed to be tested at two different concentrations (denoted as IS1 and IS2). The VC is assumed to be prepared at fixed dilutions with 22 replicates per batch. Each serum sample, along with the PC and IS, is assumed to be tested in duplicate across 3-fold serial dilutions ranging from 1:40 to 1:87480.
In our modelling framework, first, foci reduction data is calculated using the assay readout (foci count) relative to the modelled mean foci counts in the VC for each working virus dilution. We then model the relationship between the dilution factor and reduction in foci counts using a hierarchical four-parameter logistic (4PL) model, allowing for random effects both between samples and between batches on each parameter. Each sample has its own set of parameter values drawn from a population-level distribution, whereas the PC and IS serve as references, with batch-specific parameters centered on their true values. Using this hierarchical approach, we fit a 4PL curve to the dilution series data from each sample, incorporating both sample-level and batch-level effects. Thus, the observed data are described by a 4PL curve that accounts for all of these sources of variation, whereas the true, unadjusted 4PL curve for each sample is obtained by removing the batch effects. The corrected antibody titer can then be calculated directly as a function of the unadjusted 4PL curve.
The following steps are covered in this guide:
we first load the required R libraries and the Stan model code. We use two hierarchical models in this workflow:
# Clear workspace
rm(list = ls())
# Load required libraries
library(readxl)
library(dplyr)
library(ggplot2)
library(rstan)
library(bayesplot)
# Load Stan model code
source("Stan_Model.R")
# Set seed
set.seed(12345)
Similarly, experimental data for the PC, IS, and VC should also be provided in long format. The PC and IS datasets should include the columns sto_id, bat_id, rep_id, dilution, and Count, while the VC dataset should include sto_id, bat_id, and Count.
# Input experimental data from excel ----
df_sam <- read_excel("simu_data.xlsx", sheet = "Sample")
df_PC <- read_excel("simu_data.xlsx", sheet = "PC")
df_IS1 <- read_excel("simu_data.xlsx", sheet = "IS500")
df_IS2 <- read_excel("simu_data.xlsx", sheet = "IS1000")
df_VC <- read_excel("simu_data.xlsx", sheet = "VC")
# Check experimental data
df_sam %>% filter(sam_id %in% 1:6) %>%
ggplot(aes(x = log(dilution), y = Count)) +
geom_point() +
geom_line(aes(group = rep_id)) +
facet_wrap(~ sam_id, nrow = 1)
Assay-level fixed effects are estimated using Model 1. In this
example, assay-level fixed effects represent the mean foci counts for
each virus working dilution in the VC. This information will later be
used to calculate foci reduction data.
# Mapping sto_id to batches
which_sto_per_bat <- df_VC %>% group_by(bat_id) %>% slice(1)
# Prepare data list for Stan model
data1_stan <- list(num_sto_vc = 4,
num_bat_vc = 28,
num_obs_vc = nrow(df_VC),
which_sto_vc = df_VC$sto_id,
which_bat_vc = df_VC$bat_id,
count_vc = df_VC$Count,
which_sto_per_bat = which_sto_per_bat$sto_id,
# Priors
delta_prior = c(48, 44, 40, 25),
sample_from_prior = 0)
# Compile the Stan model
model <- stan_model(model_code=model1_code)
# Fit the model using MCMC
# This step takes approximately 2 minutes.
# If no progress is observed, please try setting cores = 1.
fit1 <- stan(model_code=model1_code, data=data1_stan,
chains=4, cores=4, iter=3000,
control = list(adapt_delta = 0.95))
# Save the fitted model for future use
saveRDS(fit1, file = "Simu_fit1.rds")
# Extract posterior samples from the fitted model
posterior1 <- extract(fit1)
# Summarize assay-level effects (median across posterior samples)
sto_param <- data.frame(
sto_id = 1:4,
post_sto_effect = apply(posterior1$delta, 2, median)
)
We account for assay-level fixed effects by calculating foci
reduction, defined here as the proportion of foci reduction at each
dilution relative to the mean foci count of VC for the same virus
working dilution, as estimated by Model 1.
df_PC <- left_join(df_PC, sto_param)
df_PC$FR <- 1 - df_PC$Count/df_PC$post_sto_effect
df_IS1 <- left_join(df_IS1, sto_param)
df_IS1$FR <- 1 - df_IS1$Count/df_IS1$post_sto_effect
df_IS2 <- left_join(df_IS2, sto_param)
df_IS2$FR <- 1 - df_IS2$Count/df_IS2$post_sto_effect
df_sam <- left_join(df_sam, sto_param)
df_sam$FR <- 1 - df_sam$Count/df_sam$post_sto_effect
“Clear negative” samples, defined as those exhibiting less than 30% foci reduction at the starting dilution (allowing a 20% margin to account for potential measurement error due to intra- or inter-batch variation), were excluded from curve fitting. This is reasonable because foci reduction decreases monotonically with dilution, indicating that the true response curves for such samples are highly unlikely to reach the 50% foci reduction threshold.
sample_fitting <- df_sam %>% filter(dilution == 40) %>%
group_by(sam_id, bat_id) %>%
summarise(max_y = mean(FR)) %>%
filter(max_y > 0.3) %>%
arrange(sam_id)
clear_neg <- df_sam %>% filter(!sam_id %in% sample_fitting$sam_id)
clear_neg <- unique(clear_neg$sam_id)
# Assign new sample IDs after removing 'Clear negative' samples
# New IDs should be continuous and start from 1
sample_fitting$sam_id_new <- 1:nrow(sample_fitting)
# Keep only samples retained for curve fitting
df_sam <- df_sam %>% filter(sam_id %in% sample_fitting$sam_id)
# Join new sample IDs back to the main dataset
df_sam <- left_join(df_sam, sample_fitting[, c("sam_id", "sam_id_new")],
by = join_by(sam_id))
Sample-specific antibody titers are estimated using Model 2, with
random batch effects being corrected. Depending on the
experiment-specific assay design and sources of variability, this
batch-effect correction framework can also be applied to adjust for
other laboratory- or experiment-level random effects, such as
heterogeneity arising from differences in assay protocols, equipment,
reagents, or personnel, which are common in large-scale or multicenter
studies.
# Prepare data list for Stan model
data2_stan <- list(num_bat_total = 28,
num_obs_pc = nrow(df_PC),
which_bat_pc = df_PC$bat_id,
x_log_pc = log(df_PC$dilution),
y_pc = df_PC$FR,
num_obs_IS1 = nrow(df_IS1),
which_bat_IS1 = df_IS1$bat_id,
x_log_IS1 = log(df_IS1$dilution),
y_IS1 = df_IS1$FR,
num_obs_IS2 = nrow(df_IS2),
which_bat_IS2 = df_IS2$bat_id,
x_log_IS2 = log(df_IS2$dilution),
y_IS2 = df_IS2$FR,
num_sam = length(unique(df_sam$sam_id_new)),
num_obs = nrow(df_sam),
which_bat = df_sam$bat_id,
which_sam = df_sam$sam_id_new,
x_log = log(df_sam$dilution),
y = df_sam$FR,
which_bat_per_sam = sample_fitting$bat_id,
f_pc_prior = c(1, 0, 1, 6),
f_IS1_prior = c(1, 0, 1, 6),
f_IS2_prior = c(1, 0, 1, 6),
f_pop_prior = c(1, 0, 1, 6),
f_lower = c(0, -1, 0, log(40)),
f_upper = c(2, 1, 1, log(87480)),
sample_from_prior = 0)
# Compile the Stan model
model <- stan_model(model_code=model2_code)
# Fit the model using MCMC
# This step takes approximately 15–30 minutes for this example.
fit2 <- stan(model_code=model2_code, data=data2_stan,
chains=4, cores=4, iter=3000,
control = list(adapt_delta = 0.95))
# Save the fitted model for future use
saveRDS(fit2, file = "Simu_fit2.rds")
# Extract posterior samples from the fitted model
posterior2 <- extract(fit2)
After fitting the model, it is important to assess MCMC convergence (i.e., number of divergent transitions, Rhat values, and trace plots for model parameters) before proceeding with downstream titer estimates, thereby ensuring the reliability of the MCMC results.
# Number of divergence transitions
sampler_params <- get_sampler_params(fit2, inc_warmup = FALSE)
divergent_per_chain <- sapply(sampler_params, function(x) sum(x[,"divergent__"]))
divergent_per_chain
## [1] 0 0 0 0
# Rhat
fit_summary <- summary(fit2)$summary
summary(fit_summary[, "Rhat"])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9994 0.9996 0.9998 0.9999 1.0000 1.0108
# Check population-level parameters
f_pop_samples <- rstan::extract(fit2, pars = c("f_pop[1]", "f_pop[2]", "f_pop[3]", "f_pop[4]"), permuted = FALSE)
mcmc_trace(f_pop_samples, facet_args = list(nrow = 1)) +
theme_bw()
# Check batch-level parameters
pars_to_check <- c("f_bat[1,1]", "f_bat[1,2]", "f_bat[1,3]", "f_bat[1,4]")
f_bat_samples <- rstan::extract(fit2, pars = pars_to_check, permuted = FALSE)
mcmc_trace(f_bat_samples, facet_args = list(nrow = 1)) +
theme_bw()
# Check sample-level parameters
pars_to_check <- c("f_sam_bat[1,1]", "f_sam_bat[1,2]", "f_sam_bat[1,3]", "f_sam_bat[1,4]")
f_sam_samples <- rstan::extract(fit2, pars = pars_to_check, permuted = FALSE)
mcmc_trace(f_sam_samples, facet_args = list(nrow = 1)) +
theme_bw()
By comparing batch-specific fitted curves with the BHM-adjusted overall curves for PC and IS, batch-to-batch variability can be visually assessed.
# Batch-specific fitted curves
df_PC$BHM_fit_FR <- apply(posterior2$mu_y_pc, 2, median)
df_IS1$BHM_fit_FR <- apply(posterior2$mu_y_IS1, 2, median)
df_IS2$BHM_fit_FR <- apply(posterior2$mu_y_IS2, 2, median)
df_PC$control_sample <- "PC"
df_IS1$control_sample <- "IS1"
df_IS2$control_sample <- "IS2"
df_control <- rbind(df_PC, df_IS1, df_IS2)
df_control$curve_type <- "Batch-specific fitted curves"
# BHM-adjusted overall curves
Adj_PC_curve <- data.frame(control_sample = "PC",
dilution = unique(df_PC$dilution),
BHM_fit_FR = apply(posterior2$y_pc_adj, 2, median),
BHM_fit_FR_lower = apply(posterior2$y_pc_adj, 2, quantile, 0.025),
BHM_fit_FR_upper = apply(posterior2$y_pc_adj, 2, quantile, 0.975))
Adj_IS1_curve <- data.frame(control_sample = "IS1",
dilution = unique(df_IS1$dilution),
BHM_fit_FR = apply(posterior2$y_IS1_adj, 2, median),
BHM_fit_FR_lower = apply(posterior2$y_IS1_adj, 2, quantile, 0.025),
BHM_fit_FR_upper = apply(posterior2$y_IS1_adj, 2, quantile, 0.975))
Adj_IS2_curve <- data.frame(control_sample = "IS2",
dilution = unique(df_IS2$dilution),
BHM_fit_FR = apply(posterior2$y_IS2_adj, 2, median),
BHM_fit_FR_lower = apply(posterior2$y_IS2_adj, 2, quantile, 0.025),
BHM_fit_FR_upper = apply(posterior2$y_IS2_adj, 2, quantile, 0.975))
df_control_adj <- rbind(Adj_PC_curve, Adj_IS1_curve, Adj_IS2_curve)
df_control_adj$curve_type <- "BHM-adjusted overall curves"
# Plot batch-specific fitted curves and BHM-adjusted overall curves
ggplot() +
geom_line(data = df_control,
aes(x = log(dilution),
y = BHM_fit_FR,
group = bat_id,
color = curve_type), alpha = 0.5) +
geom_line(data = df_control_adj,
aes(x = log(dilution),
y = BHM_fit_FR,
color = curve_type)) +
geom_ribbon(data = df_control_adj,
aes(x = log(dilution),
ymin = BHM_fit_FR_lower,
ymax = BHM_fit_FR_upper),
fill = "#D7191C", alpha = 0.2) +
geom_hline(yintercept = 0.5, color = "grey", linetype = "dashed") +
facet_wrap(~factor(control_sample)) +
scale_color_manual("",
values = c("Batch-specific fitted curves" = "#FDAE61",
"BHM-adjusted overall curves" = "#D7191C")) +
scale_x_continuous("Serial dilutions",
breaks = log(unique(df_control_adj$dilution)),
labels = unique(df_control$dilution)) +
scale_y_continuous("Foci reduction (%)",
limits = c(-0.5, 1),
breaks = seq(-0.5, 1, 0.25),
labels = 100 * seq(-0.5, 1, 0.25)) +
theme_bw()
By comparing BHM-adjusted and unadjusted 4PL curves, the impact of batch effects on the titer estimate for each sample can be visually assessed.
sample_to_check <- 1
# BHM-unadjusted curve
df_sam$BHM_fit_FR <- apply(posterior2$mu_y, 2, median)
df_sam$BHM_fit_FR_lower <- apply(posterior2$mu_y, 2, quantile, 0.025)
df_sam$BHM_fit_FR_upper <- apply(posterior2$mu_y, 2, quantile, 0.975)
# BHM-adjusted curve
BHM_adj_FR <- apply(posterior2$y_sam_adj, c(2, 3), median)
BHM_adj_FR_lower <- apply(posterior2$y_sam_adj, c(2, 3), quantile, 0.025)
BHM_adj_FR_upper <- apply(posterior2$y_sam_adj, c(2, 3), quantile, 0.975)
# Plot BHM-adjusted and BHM-unadjusted curves
df_sam %>% filter(sam_id_new == sample_to_check) %>%
ggplot() +
geom_ribbon(aes(x = log(dilution),
ymin = BHM_fit_FR_lower,
ymax = BHM_fit_FR_upper,
fill = "BHM-unadjusted"),
alpha = 0.2) +
geom_line(aes(x = log(dilution),
y = BHM_fit_FR,
group = sam_id_new,
color = "BHM-unadjusted")) +
geom_line(data = data.frame(dilution = unique(df_sam$dilution),
BHM_adj_FR = BHM_adj_FR[, sample_to_check]),
aes(x = log(dilution),
y = BHM_adj_FR,
color = "BHM-adjusted")) +
geom_ribbon(data = data.frame(dilution = unique(df_sam$dilution),
BHM_adj_FR_lower = BHM_adj_FR_lower[, sample_to_check],
BHM_adj_FR_upper = BHM_adj_FR_upper[, sample_to_check]),
aes(x = log(dilution),
ymin = BHM_adj_FR_lower,
ymax = BHM_adj_FR_upper,
fill = "BHM-adjusted"),
alpha = 0.2) +
geom_hline(yintercept = 0.5, color = "grey", linetype = "dashed") +
scale_color_manual("",
values = c("BHM-unadjusted" = "#FDAE61",
"BHM-adjusted" = "#D7191C")) +
scale_fill_manual("",
values = c("BHM-unadjusted" = "#FDAE61",
"BHM-adjusted" = "#D7191C")) +
scale_x_continuous("Serial dilution",
breaks = log(unique(df_control_adj$dilution)),
labels = unique(df_control$dilution)) +
scale_y_continuous("Foci reduction (%)",
limits = c(-0.5, 1),
breaks = seq(-0.5, 1, 0.25),
labels = 100 * seq(-0.5, 1, 0.25)) +
theme_bw()
Finally, posterior medians and 95% credible intervals (CrIs) for antibody titers were calculated for each sample based on the MCMC results. For ‘clear negative’ samples, antibody titers were imputed at the assay’s lower limit of quantification (LLOQ).
# Summarize posterior samples: median and 95% CrI, log2-transformed
posterior_summary <- apply(posterior2$Nt_sam_adj, 2, function(x) {
Median <- exp(median(x))
CrI_lower <- exp(quantile(x, 0.025))
CrI_upper <- exp(quantile(x, 0.975))
c(log2(Median), log2(CrI_lower), log2(CrI_upper))
})
summary_df <- as.data.frame(t(posterior_summary))
colnames(summary_df) <- c("Estimate", "CrI_lower", "CrI_upper")
summary_df$sam_id_new <- 1:nrow(summary_df)
sample_fitting <- left_join(sample_fitting[, c("sam_id", "sam_id_new")],
summary_df, by = join_by(sam_id_new))
# Include'clear negative' samples
clear_neg_data <- data.frame(sam_id = clear_neg,
sam_id_new = NA,
Estimate = log2(40/3),
CrI_lower = NA,
CrI_upper = NA)
# Combine results and sort by sample ID
titer_estimate <- rbind(sample_fitting, clear_neg_data) %>%
arrange(sam_id)
# Preview results
titer_estimate[1:10,]
## # A tibble: 10 × 5
## # Groups: sam_id [10]
## sam_id sam_id_new Estimate CrI_lower CrI_upper
## <dbl> <int> <dbl> <dbl> <dbl>
## 1 1 1 7.92 7.22 8.64
## 2 2 2 7.55 6.80 8.29
## 3 3 3 8.15 7.49 8.79
## 4 4 4 8.91 8.09 9.70
## 5 5 5 6.91 6.10 7.73
## 6 6 6 11.0 10.2 11.9
## 7 7 7 7.12 6.26 7.99
## 8 8 8 7.58 6.77 8.35
## 9 9 9 9.54 8.83 10.2
## 10 10 10 9.88 8.95 10.8