| Title: | Surrogate Evaluation for Jointly Longitudinal Outcome and Surrogate |
|---|---|
| Description: | Tools for surrogate evaluation in longitudinal studies using state-space models as proposed in Santos Jr. and Parast (2026)<doi:10.48550/arXiv.2604.12882>. The package estimates treatment effects over time with and without adjustment for surrogate information, summarizes the proportion of treatment effect explained by a longitudinal surrogate, quantifies uncertainty via bootstrap resampling, and provides plotting and summary utilities for fitted models. |
| Authors: | Silvaneo dos Santos Jr. [aut, cre], Layla Parast [aut] |
| Maintainer: | Silvaneo dos Santos Jr. <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.0.5 |
| Built: | 2026-06-05 06:10:18 UTC |
| Source: | https://github.com/silvaneojunior/onlinesurr |
Fits two Gaussian state-space models (Dynamic Linear Models) to jointly longitudinal outcome data: (i) a marginal model for the outcome trajectory given treatment and time, and (ii) a conditional model that additionally adjusts for a user-specified surrogate structure. The function returns per-time treatment-effect estimates from both models and subject-level bootstrap draws obtained via subject-level resampling.
fit.surr( formula, id, surrogate, treat, data = NULL, time = NULL, N.boots = 2000, verbose = 1, D.local = 0.8 )fit.surr( formula, id, surrogate, treat, data = NULL, time = NULL, N.boots = 2000, verbose = 1, D.local = 0.8 )
formula |
An object of class |
id |
A variable (unquoted) identifying subjects. Each subject must have at most one measurement per |
surrogate |
A formula describing the surrogate structure to be included in the conditional model. May be provided either as a |
treat |
A variable (unquoted) indicating treatment assignment. Must encode exactly two treatment levels after coercion to a factor. |
data |
A |
time |
Optional variable (unquoted) giving the measurement time index. Must be numeric and equally spaced across observed time points. If |
N.boots |
Integer number of subject-level bootstrap replicates. Each replicate resamples subjects with replacement and recombines subject-specific sufficient quantities to form bootstrap draws of the fixed effects. |
verbose |
Logical scalar indicating whether to print progress information during model fitting. If |
D.local |
Numeric, a number between 0 and 1 indicating the discount factor to be used for the random effect block. This factor controls how smooth the random effect evolve over time. A discount factor of 1 means that the random effects do not change over time, so that each individual has its own local level, but that level is the same for all times. A discount factor of 0 is not acceptable (the kDGLM package will replace it by 1), but values closer to 0 imply in a more flexible dynamic. See West and Harrison (1997) or the appendix in dos Santos Jr. and Parast (2026) for instructions on how to specify the discount factor. |
The implementation follows a two-model decomposition used for estimating longitudinal treatment effects and surrogate-adjusted (residual) treatment effects in a state-space framework.
See dos Santos Jr. and Parast (2026) for details on the methodology.
See West and Harrison (1997) for best practices on model specification in the state-space model setting.
Data requirements. The data must have at most one row per subject-time pair; time must be numeric and equally spaced (or omitted, in which case an index is created). Treatment and subject identifiers are coerced to factors with sorted levels.
Model structure. The marginal model includes treatment-by-time fixed effects and a subject-specific random-walk component to capture within-subject correlation. The conditional model adds the user-specified surrogate structure to the design, and checks that treatment is not a linear combination of the surrogate design (rank check).
Bootstrap. Subjects are resampled with replacement. Subject-specific filtered quantities are computed once and recombined in each bootstrap iteration to reduce computational cost, consistent with a subject-level nonparametric bootstrap strategy for replicated time series.
An object of class "fitted_onlinesurr": a named list with elements $Marginal and $Conditional. Each of these contains:
point: the point estimate vector of the treatment effect at each time point.
smp: a matrix of bootstrap draws for the treatment effect at each time point, with one column per bootstrap replicate. The draws are generated from the joint distribution of the full vector, thereby accounting for the dependence among different time points. The samples from the marginal (total effect) and conditional (residual effect) models are paired, so that the i-th samples from both models are drawn jointly from the distribution of the estimators.
The object also includes:
T: number of unique time points.
N: number of subjects.
n.fixed: number of fixed-effect coefficients implied by formula for a single subject prior to stacking across subjects.
dos Santos Jr. SV, Parast L (2026).
“A Causal Framework for Evaluating Jointly Longitudinal Outcomes and Surrogate Markers: A State-Space Approach.”
2604.12882, https://arxiv.org/abs/2604.12882.
West M, Harrison J (1997).
Bayesian Forecasting and Dynamic Models (Springer Series in Statistics).
Springer-Verlag.
ISBN 0387947256.
fit <- fit.surr(y ~ 1, id = id, surrogate = ~s, treat = trt, data = sim_onlinesurr, # This dataset is included in the OnlineSurr package time = time, verbose = 0, N.boots = 500 # Generally, this value would be too small. # Remember to increase it for your dataset. ) summary(fit)fit <- fit.surr(y ~ 1, id = id, surrogate = ~s, treat = trt, data = sim_onlinesurr, # This dataset is included in the OnlineSurr package time = time, verbose = 0, N.boots = 500 # Generally, this value would be too small. # Remember to increase it for your dataset. ) summary(fit)
Returns a lagged version of x by shifting its values forward by k positions and padding the first k entries with zeros.
lagged(x, k = 1)lagged(x, k = 1)
x |
A vector to be lagged. |
k |
A non-negative integer giving the lag order. |
This function is intended for use in model formulas when delayed effects of a predictor should be included explicitly.
A vector of the same length as x, with the first k values set to 0 and the remaining values taken from x shifted by k positions.
x <- 1:5 lagged(x) lagged(x, k = 2)x <- 1:5 lagged(x) lagged(x, k = 2)
"fitted_onlinesurr" objectProduces a ggplot2 figure showing, over time, either the Local PTE (LPTE), the Cumulative PTE (CPTE), or the marginal and residual treatment effects and (labeled and in the plot). Point estimates are taken from object$Marginal$point and object$Conditional$point, with uncertainty bands computed from the stored bootstrap draws.
## S3 method for class 'fitted_onlinesurr' plot(x, type = "LPTE", conf.level = 0.95, one.sided = TRUE, ...)## S3 method for class 'fitted_onlinesurr' plot(x, type = "LPTE", conf.level = 0.95, one.sided = TRUE, ...)
x |
A |
type |
Character string specifying what to plot. One of |
conf.level |
Numeric in |
one.sided |
Logical; if |
... |
Additional arguments (currently unused) included for S3 method compatibility. |
The function extracts time-indexed treatment-effect estimates (marginal) and (residual/conditional) from the fitted object, along with bootstrap draws for each. It then constructs:
LPTE: .
CPTE: .
Delta: plots and directly.
Point estimates are plotted as points; intervals are empirical quantile intervals computed from the bootstrap sample matrices stored in object.
A ggplot object.
fit <- fit.surr(y ~ 1, id = id, surrogate = ~s, treat = trt, data = sim_onlinesurr, # This dataset is included in the OnlineSurr package time = time, verbose = 0, N.boots = 500 # Generally, this value would be too small. # Remember to increase it for your dataset. ) plot(fit, type = "LPTE") plot(fit, type = "CPTE", conf.level = 0.90, one.sided = FALSE) plot(fit, type = "Delta")fit <- fit.surr(y ~ 1, id = id, surrogate = ~s, treat = trt, data = sim_onlinesurr, # This dataset is included in the OnlineSurr package time = time, verbose = 0, N.boots = 500 # Generally, this value would be too small. # Remember to increase it for your dataset. ) plot(fit, type = "LPTE") plot(fit, type = "CPTE", conf.level = 0.90, one.sided = FALSE) plot(fit, type = "Delta")
Build a B-spline basis for a numeric vector using a Cox-de Boor style recursion. By default, the function constructs a cubic spline basis (P = 3) and chooses the number of basis functions from the number of unique values in x.
s( x, P = 3, K = min(7, max(3, floor(log2(length(unique(x)))))), limits = c(NA, NA), knots = "eq" )s( x, P = 3, K = min(7, max(3, floor(log2(length(unique(x)))))), limits = c(NA, NA), knots = "eq" )
x |
A numeric vector of predictor values. |
P |
A non-negative integer giving the spline degree. |
K |
An integer giving the number of basis functions to return. The default increases slowly with the number of unique values in |
limits |
A numeric vector of length 2 giving the lower and upper boundary limits for the spline basis. Missing values are replaced by |
knots |
Either a numeric vector of knot locations, or one of |
Boundary limits are taken from x unless supplied explicitly. Knot locations may be given directly as a numeric vector, or generated either at equally spaced locations ("eq") or at empirical quantiles ("quantile").
The returned basis has length(x) rows and k columns.
When knots is generated internally, the function first creates K - P + 1 knot locations and then augments them with repeated boundary knots so the recursion can be evaluated.
A numeric matrix with one row per element of x and one column per spline basis function.
x <- seq(0, 1, length.out = 10) # Default cubic basis B <- s(x) dim(B) # Equally spaced knots with custom basis size B2 <- s(x, K = 5, knots = "eq") # Quantile-based knots B3 <- s(x, knots = "quantile")x <- seq(0, 1, length.out = 10) # Default cubic basis B <- s(x) dim(B) # Equally spaced knots with custom basis size B2 <- s(x, K = 5, knots = "eq") # Quantile-based knots B3 <- s(x, knots = "quantile")
A simulated long-format dataset illustrating the input structure expected by [fit.surr()] for surrogate evaluation with jointly longitudinal outcomes and surrogate markers.
sim_onlinesurrsim_onlinesurr
A data frame with 600 rows and 5 variables:
Integer subject identifier.
Binary treatment indicator: '0' for control and '1' for treated.
Numeric measurement time index taking values '1' through '6'.
Continuous longitudinal surrogate marker.
Continuous longitudinal primary outcome.
The dataset contains 100 subjects observed at 6 equally spaced time points. Treatment assignment is binary and constant within subject. The surrogate marker 's' varies over time and is affected by treatment. The primary outcome 'y' depends on treatment, time, and the surrogate marker.
Rows are ordered by subject identifier and time.
This dataset was generated for package examples and testing. It represents a balanced longitudinal design with one observation per subject-time pair. Measurement times are equally spaced, which is a requirement for use with [fit.surr()].
In the data-generating mechanism, the surrogate marker is affected by time and treatment, and the outcome depends on time, treatment, and the surrogate.
Simulated data generated within the package; not based on an external study.
"fitted_onlinesurr" objectPrints a human-readable report for an object of class "fitted_onlinesurr" returned by fit.surr. The report includes marginal and conditional treatment-effect estimates at a selected time point (or cumulatively up to that time), an estimate of the LPTE/CPTE, and a time-homogeneity test of the LPTE.
## S3 method for class 'fitted_onlinesurr' summary(object, t = object$T, cumulative = TRUE, signif.level = 0.05, ...)## S3 method for class 'fitted_onlinesurr' summary(object, t = object$T, cumulative = TRUE, signif.level = 0.05, ...)
object |
A |
t |
Integer time index at which to evaluate treatment effects and the PTE. If |
cumulative |
Logical; if |
signif.level |
Numeric in |
... |
Additional arguments passed to downstream summary/print utilities (if any). |
The "fitted_onlinesurr" object stores point estimates and bootstrap samples for marginal and surrogate-adjusted (conditional) models in object$Marginal and object$Conditional.
No return value. Called for its side effect of printing a summary report.
fit <- fit.surr(y ~ 1, id = id, surrogate = ~s, treat = trt, data = sim_onlinesurr, # This dataset is included in the OnlineSurr package time = time, verbose = 0, N.boots = 500 # Generally, this value would be too small. # Remember to increase it for your dataset. ) # Cumulative up to time 5 summary(fit, t = 5, cumulative = TRUE, signif.level = 0.05) # Time-specific at time 5 summary(fit, t = 5, cumulative = FALSE)fit <- fit.surr(y ~ 1, id = id, surrogate = ~s, treat = trt, data = sim_onlinesurr, # This dataset is included in the OnlineSurr package time = time, verbose = 0, N.boots = 500 # Generally, this value would be too small. # Remember to increase it for your dataset. ) # Cumulative up to time 5 summary(fit, t = 5, cumulative = TRUE, signif.level = 0.05) # Time-specific at time 5 summary(fit, t = 5, cumulative = FALSE)
Tests the null hypothesis that the LPTE is constant over time. The test is based on the difference between the conditional and marginal treatment-effect trajectories implied by a fitted "fitted_onlinesurr" object, standardized by an estimated covariance, and uses a max-type statistic to control the family wise error across time points.
time_homo_test(model, signif.level = 0.05, N.boots = 50000)time_homo_test(model, signif.level = 0.05, N.boots = 50000)
model |
A fitted object of class |
signif.level |
Numeric in (0,1) giving the test significance level used to form the critical value from the bootstrap distribution. Default is |
N.boots |
Integer number of Monte Carlo draws used to approximate the null distribution of the max standardized deviation statistic and to compute the p-value. Default is |
See dos Santos Jr. and Parast (2026) for the theoretical details about this test.
Notes:
The function assumes the first T time-specific treatment-effect parameters are stored contiguously at the beginning of model$Marginal$point and model$Conditional$point (and similarly for smp). It uses the index 1:(n.fixed) as implemented in the code: 1:(T + n.fixed - T).
N.boots here is a Monte Carlo size for the null simulation (distinct from the bootstrap size used when fitting model).
A named list with:
T: the observed test statistic (maximum absolute standardized deviation).
T.crit: the 1-signif.level critical value.
p.value: the Monte Carlo p-value mean(T_null > T_obs).
dos Santos Jr. SV, Parast L (2026). “A Causal Framework for Evaluating Jointly Longitudinal Outcomes and Surrogate Markers: A State-Space Approach.” 2604.12882, https://arxiv.org/abs/2604.12882.
fit <- fit.surr(y ~ 1, id = id, surrogate = ~s, treat = trt, data = sim_onlinesurr, # This dataset is included in the OnlineSurr package time = time, verbose = 0, N.boots = 500 # Generally, this value would be too small. # Remember to increase it for your dataset. ) time_homo_test(fit, signif.level = 0.05, N.boots = 500)fit <- fit.surr(y ~ 1, id = id, surrogate = ~s, treat = trt, data = sim_onlinesurr, # This dataset is included in the OnlineSurr package time = time, verbose = 0, N.boots = 500 # Generally, this value would be too small. # Remember to increase it for your dataset. ) time_homo_test(fit, signif.level = 0.05, N.boots = 500)