-
Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations*NeuralDiffEqn/paper 2025. 8. 9. 21:58
https://arxiv.org/pdf/2206.08311
(ICML 2022)
Abstract
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare by assisting decision-makers to answer “what-if” questions. Existing causal inference approaches typically consider regular, discrete-time intervals between observations and treatment decisions and hence are unable to naturally model irregularly sampled data, which is the common setting in practice. To handle arbitrary observation patterns, we interpret the data as samples from an underlying continuous-time process and propose to model its latent trajectory explicitly using the mathematics of controlled differential equations. This leads to a new approach, the Treatment Effect Neural Controlled Differential Equation (TE-CDE), that allows the potential outcomes to be evaluated at any time point. In addition, adversarial training is used to adjust for time-dependent confounding which is critical in longitudinal settings and is an added challenge not encountered in conventional time-series. To assess solutions to this problem, we propose a controllable simulation environment based on a model of tumor growth for a range of scenarios with irregular sampling reflective of a variety of clinical scenarios. TE-CDE consistently outperforms existing approaches in all simulated scenarios with irregular sampling.
1. Introduction
Decision-makers must answer several critical questions before taking an action. In the clinical setting, before a treatment is given, clinicians must evaluate whether a treatment should be given and, if so, both what treatment is best for their patient and when the treatment should be administered. Answering such questions requires reliably estimating the effect of a treatment or sequence of treatments. While from a causal inference perspective, clinical trials represent the gold standard to answer these questions, it is highly desirable to estimate treatment effects from observational data. This is due to the significant expense, relatively small sample sizes, and narrow inclusion criteria of clinical trials.
There are several causal inference methods proposed in the static setting (e.g. Shalit et al., 2017; Alaa & van der Schaar, 2017; Yoon et al., 2018). However, estimating the effects of treatments over time is of paramount importance for real-world administration of complex treatment plans and personalized healthcare. Only in the longitudinal setting can we understand how diseases evolve under different treatment plans, how individual patients respond to treatment over time, or the optimal timing for treatment.
However, estimating counterfactual outcomes in the longitudinal setting introduces additional challenges, the most significant of which is that the observed treatment assignment may depend on confounding variables that vary over time (time-dependent confounding, Platt et al., 2009). For example, not all cancer patients are equally likely to be offered the same chemotherapy regimen. In particular, the history of patients’ covariates and their response to past treatments affects future treatments (Bica et al., 2021). This can introduce bias in causal effects and variance in the estimation of counterfactuals due to the systematic differences in the distribution of confounding variables between any two sets of treatments over time.
This issue of time-dependent confounding and distribution shift is the primary challenge of causal inference over time, not encountered in standard time-series. Hence, conventional time-series models are not applicable to our setting as they do not adjust for bias introduced by time-varying confounders and hence are sensitive to the policy in the observational data (Schulam & Saria, 2017).
While prior work in causal inference has sought to mitigate such confounding bias (Robins et al., 2000; Lim et al., 2018; Bica et al., 2020b), the setting considered is overly restrictive and does not reflect most real-world observation data. In particular, previous work assumes that data is regular and arrives at fixed, evenly spaced time intervals and that the sampling times perfectly coincide between different individuals. However, neither is true in practice, significantly limiting the practical use of such methods.
Discretizing the patient’s evolution over time, an inherently continuous process, has significant limitations, both when learning from historical data and for prospective clinical use. From a learning perspective, observational data is typically not sampled regularly. Indeed, irregularity in observational data can manifest for simple reasons, such as scheduling, a patient missing an appointment, or a healthcare practitioner not capturing the observation, to more complex considerations, for example more severe cases are often observed more frequently while different treatments can require differences in monitoring.
Prospective use cases raise similar issues surrounding mismatches between the discretization scheme and desired evaluation times that means the chosen discretization may not be applicable. As a result, for real-world applications where data is sampled irregularly, we believe that treatment effects over time should be modeled in a continuous manner.
Contributions
In this paper, we address the realistic but understudied problem of counterfactual estimation in the irregularly sampled setting with time-dependent confounding; a significantly more complex setting for counterfactual estimation than the standard regular, discrete setting.
To do so, we depart from existing methods based on recurrent neural networks (RNNs) and propose a novel alternative inspired by recent breakthroughs in neural controlled differential equations (CDEs) (Kidger et al., 2020), which we call the Treatment Effect Neural Controlled Differential Equation (TE-CDE).
To model the observation histories, we learn a continuous latent representation of the patient state as the solution to a CDE. To the best of our knowledge, this is the first work to frame the evolution of a patient’s latent state as the solution to a CDE. This framing enables TE-CDE to learn from arbitrary historical observation patterns and allows potential outcomes to be evaluated at any point in time.
In addition, we introduce a controllable simulation environment based on a realistic model for tumor growth to generate irregularly sampled observational data. We demonstrate that the unrealistic assumptions imposed by existing state-of-the-art models lead to reduced performance in a range of irregularly sampled scenarios, and that TE-CDE outperforms these methods across all scenarios with irregularly sampled observation histories.

2. Related Work
This paper primarily engages with the literature on treatment effect estimation with time-varying covariates, treatments, and outcomes, but also draws on insights from causality in dynamical systems and recent work on modeling controlled differential equations. We explicitly note the difference between causal inference over time and conventional time series modeling as outlined in Section 1 and hence do not focus on recent advances in time series models. An extended discussion of related work can be found in Appendix B. In Table 1, we contrast the problem setting and assumptions of TE-CDE to other related work.

We argue for modeling the underlying continuous-time processes that give rise to the discrete observational data, which may itself be highly irregular. We contrast this approach with discrete-time methods that use a common discretization for all time series and are forced to interpolate and impute before model fitting. These methods also differ by how they adjust for confounding and for differences in covariate distributions in different treatment regimes. Marginal Structural Models (MSMs) are linear in treatment and covariate effect and create a pseudo-population using inverse probability of treatment weighting, such that the probability of treatment does not depend on the time-varying confounders and thus effectively controlling for confounding bias (Robins et al., 2000). Lim et al. (2018) proposed a semi-parametric alternative to MSMs using recurrent neural networks to estimate propensity weights. The Counterfactual Recurrent Network (CRN, Bica et al., 2020b) uses a similar architecture but instead uses adversarial training to balance differences in covariate distributions in different treatment regimes. However, both assume data to be regularly sampled and fully observed at all time points, which is unrealistic in practice.
Gaussian process-based approaches such as Schulam & Saria (2017) are applicable to longitudinal data and take a continuous-time approach but in contrast, make strong assumptions about the model structure that is dependent on a particular application and prior knowledge of the form of the processes involved. Closer to the proposed approach, neural ordinary differential equations (ODE, Chen et al., 2018; Rubanova et al., 2019) and extensions (Kidger et al., 2020; Morrill et al., 2021) have been considered for modeling irregular time series data. However, neural ODE type methods are conventional time series models, which do not account for issues such as time-dependent confounding. In the context of intervention modeling, Gwak et al. (2020) proposed to use separate ODEs for interventions and outcome processes. However, they did so for systems with deterministic dynamics without integrating time-varying covariates and without addressing confounding. As a result, their approach is not applicable to treatment effect estimation in healthcare. Related is also Bellot & van der Schaar (2021) that proposed to model treatment effects in continuous time in the context of synthetic controls; however, contrasting our setting where there could be interventions over time, they only consider a single intervention at a particular time point and the approach is not applicable more generally to address multiple treatments.
3. Problem Formulation
We consider n i.i.d. individuals over a study period [0, T]. Each individual is represented by a d-dimensional path X : [0, T] → R^d, that defines the trajectory of patient covariates over time (and can include static covariates defined to be constant over time), a treatment process A : [0, T] → {0, 1} is a discrete path indicating treatment at each time t ∈ [0, T], i.e. A_t = a, where a ∈ {0, 1} and a counting process N : [0, T] → N to denote the treatment assignment pattern of a single treatment over time, e.g. the number of treatments administered up to a given time. These processes are assumed to control or modulate an outcome of interest Y : [0, T] → R, e.g. the tumor size of cancer patients over time, and we will distinguish between potential outcomes of Y, denoted Y(A = a) or Y(a) for simplicity, to define the potential outcome trajectory of patient i had it been given a treatment path defined by A = a.
In the context of electronic health records (EHRs) and most practical applications, the latent paths X are only partially-observed through m irregular observations, {(t_0, X_t0 ),(t_1, X_t1 ), . . . ,(t_m, X_tm)}, with each t_j ∈ R the timestamp of the observation X_tj ∈ R^d.
To avoid notation clutter, we use the time subscript to refer to function evaluation. The same observations apply to paths A and Y . The case where each i-th patient observation sequence has its own m_i irregular time stamps t_i,0, . . . , t_i,mi , thus differences in sampling intensity within a patient’s trajectory and between different patients can be considered without modification of any part of the exposition. Indeed, analyzing time series data with such a complex pattern of observation is the central motivation of this work. Let F_t denote the filtration that is generated by all the observable events for a given individual up to time t, including observations of Xs, As and Ys for s ≤ t.
Our goal is to derive unbiased estimates of the potential outcomes at a given time t': E[Y_t' (A = a)|F_t], for any value of time in the future t' > t, hypothesized discrete treatment path A : [t, t'] → {0, 1} with values a, given past observations up to time t, F_t. However, with observational data only one of these potential outcomes trajectories is observed for each unit depending on the treatment assignment. We refer to the unobserved potential outcomes as counterfactuals.
Potential outcomes processes are identifiable with respect to the filtration generated by the observed data under the following three assumptions. These three assumptions are the standard causal inference assumptions.


Overlap means that there is some positive probability of treatment assignment at any point along a patient’s trajectory over the time interval. It can be understood as a direct extension to the more familiar overlap assumption in the static context, 0 < P(Treatment = 1|x) < 1.
The last assumption extends unconfoundedness, or strong ignorability given a patient’s trajectory, to ensure that it is sufficient to condition on the observed trajectory up to time t to block all backdoor paths, i.e. spurious correlation not part of the direct causal effect of interest, to the potential outcome at any time in the future. Similar to Assumption 2, unconfoundedness has previously been extended to the continuous-time domain for stochastic processes by Lok (2008); Saarela & Liu (2016); Ryalen et al. (2019).

It is worth mentioning that the intensity process plays the same role as propensity scores in discrete-time models (Robins, 1997), modeling the switching of the treatment process. Assumption 3 can thus be thought of as formalizing sequential randomization in the continuous-time model by stating that the intensity process does not depend on future potential outcomes, i.e. the current information is enough to estimate counterfactuals in the future without bias.
4. Treatment Effect Controlled Differential Equation
TE-CDE frames the latent trajectory of a patient’s state, as a response to a controlled differential equation (CDE), driven by covariate, treatment, and outcome processes (Fig. 2), which to the best of our knowledge is the first to do so.

This formulation using a CDE permits to account for information available at t > 0 (rather than just initial value t = 0). In particular, neural controlled differential equations (Kidger et al., 2020; Morrill et al., 2021) allow incoming information to modulate the dynamics. This ability is natural in a clinical setting, as not only can we model the continuous-time latent state evolution of a patient trajectory, but also we account for incoming data (e.g. treatment changes) that modulate the dynamics of the system.
※ 이건 진짜 내가 CDE를 보면서 감탄했던 부분인데, CDE의 power다.
RNN이 discrete하게 input으로 latent를 update한다면, 그리고 natural timescale을 반영하지 못하고 fixed time step으로 update한다면,
그리고, ODE는 latent update 사이가 continuous하지만, 오직 initial value에서만 출발한다면
CDE는 input stream을 받아서 natural timescale에 따라 continous하게 hidden state을 update 한다.
그러니까 말하자면, 가장 자연스러운 modeling이 되시겠다.
(추가적으로, 자연스러울 뿐만 아니라, sampling frequency 자체가 information이기 때문에 이 또한 반영된다.)
본 논문에서는 clinical treatment에 따른 patent state를 latent로 modeling했는데, 시간이 흐름에 따라 시시각각 treatment에 따라서 state가 달라질텐데, 그 trajectory에 대한 가장 자연스러운 modeling이라는 걸 직관적으로 느낄 수가 있다.
We now present key components needed to facilitate the modeling of counterfactual outcomes in continuous time. Additional properties of TE-CDE are discussed in Appendix D. The key components are as follows:
(1) TE-CDE’s encoder learns a representation that is defined continuously in time (i.e. a continuous latent path), rather than only at discrete time steps.
(2) The latent path trajectory evolves as a response of a Neural Controlled Differential Equation (CDE).
(3) Decoding and prediction are in continuous-time.
(4) TE-CDE uses domain adversarial training to learn a representation that adjusts for time-dependent confounding and hence is suitable for causal estimation.
Encoding the latent path Z.
TE-CDE’s encoder ingests historical observations F_t up to time t and learns a latent path Z : [t0, t] → R^l continuously over time that will be designed to be both predictive of the factual outcomes and agnostic of the observed assigned treatment. An explicit continuous-time representation allows us to process measurements with arbitrary observation patterns. We assume the initial state of the path Z_t0 to be parameterized by a neural network gη : R^d+1+1 → R^l embeds the initial outcome, covariate and treatment observations into a l-dimensional latent state which can be expressed as the solution to a CDE,

for t ∈ (t_0, T] which denotes the present time, up to which observations of all processes are available. The dynamics of potential outcomes when controlled by the covariate and treatment process take the form of a CDE (Lyons et al., 2007). Hence, the solution Z is said to be the response of a Neural CDE (Kidger et al., 2020) driven or controlled by the covariate, treatment and outcome processes (concatenated into a vector [Xt, At, Yt] ∈ R d+1+1). In this sense, Neural CDEs are a family of continuous-time models that explicitly define the latent vector field fθ : R^l → R^(d+1+1)×l by a neural network parameterized by θ, and the dynamics are modulated by the values of an auxiliary path over time.
We computationally obtain the latent path up to t from F_t by solving the above initial value problem (IVP): ∀s ∈ [t_0, t],

where ODESolve denotes a numerical ODE solver as proposed by Kidger et al. (2020).
In practice, we have access to observations at certain (irregular) time points. Thus, we define an interpolation of the data with piece-wise continuous derivatives that serves as an approximation of the underlying paths 2.

Decoding and prediction.
After the encoder processes all the observations up to the present time t, TE-CDE starts to decode and predict the potential outcomes up to some time t' > t in the future for a hypothetical treatment schedule defined by the user. At this point, the latent path Z potentially changes as a result of the chosen treatment schedule, which can similarly be formalized using a second controlled differential equation such that,

where t' denotes a desired time horizon, Z_t is the latent state of Z at time t which encodes the patient’s history, and As represents the hypothetical treatment schedule for t < s < t' . fφ : R^l+1 → R^l is a feed-forward neural network with trainable weights φ. As before, the decoded path can be obtained by solving the IVP:

Domain adversarial training for counterfactual estimation.
The covariates X are time-dependent confounders, which can increase variance in the estimation of counterfactuals if the treatment distribution is not properly balanced given a patient’s trajectory (Mansournia et al., 2017). While unbiased by Assumption 3, counterfactual estimates may have lower variance given patient trajectories frequently observed in the data but higher variance for infrequently observed patient trajectories with consequences for performance generalization of the treatment effect as demonstrated by Shalit et al. (2017). To mitigate this confounding bias, we ensure the latent representation Zt is not predictive of the observed treatment assignment pattern (Shalit et al., 2017; Bica et al., 2020b) which effectively induces representations that are balanced with respect to treatment assignment over time. The treatment invariance breaks the association between time-dependent confounders Xt and current treatment At.
At each time t, the j different treatments A ∈ {A1, . . . , Aj} represent our domains. We then require at each timestep t, that the latent path Zt be invariant across treatments options: P(Zt|At = 0) = P(Zt|At = 1) and more generally equal across any two values in the domain of treatment options. In this context, distributions of the latent state differ across treatment groups if a classifier trained as a function Zt to predict treatment assignment accurately separates the two groups. Such representations are called balancing representation as it balances the probability of the predicted treatment process p(At = 1|Zt) = 0.5, i.e. minimizing the distributional variance between treatment groups in the representation space (Johansson et al., 2020).
We use two neural networks h_ν : R^l → R^d and h_a : R^l → [0, 1] to predict the outcome and treatment: ∀s ∈ [t, t']

Suppose there are k ≥ 1 observations in the time window

where µ > 0 is a hyper-parameter controlling the trade-off between treatment and outcome prediction. Note that the minus sign before L^(a) would effectively maximize the treatment prediction loss and ensure that zt is not predictive of treatment assignment At. This leads to balancing representations, which remove bias introduced by time-dependent confounders and allow for reliable counterfactual estimates.
Remarks on invariant representations.
As shown by Johansson et al. (2019), invertible transformations (φ) are necessary for consistency of domain invariant representations (Z). We include for completeness that if φ is non-invertible there is information loss, which leads to unobservable error (η). Thus, we desire an invertible φ, which ensures η = 0. This highlights an important strength of TE-CDE, where by properties of ODEs/CDEs (Zhang et al., 2020), the representations from TE-CDE have guaranteed invertibility, since integration backward in time is always possible or we can alternatively integrate: −fφ(Zs).
Intensity of sampling.
It is well-known for EHR data that sampling frequency and observations (or lack thereof) carry information about the patient’s health status (Alaa et al., 2017). In such cases, we can replace each observed tuple (x_tj , a_tj , y_tj ) with (x_tj , a_tj , y_tj , c_tj ) where c_tj ∈ R^d+1+1 counts the number of times each one of the dimensions of X, A and Y have been observed up to time tj . The extended tuple is fed into the encoder to inform it about the sampling.
5. Experiments
In this section, we validate the ability of TE-CDE to estimate counterfactual outcomes from irregularly sampled observational data. Since counterfactual outcomes are not known for real-world data, it is necessary to use synthetic or semi-synthetic data for empirical evaluation. First, we describe a simulation environment based on a Pharmacokinetic-Pharmacodynamic (PK-PD) model of lung cancer tumor growth (Geng et al., 2017), which allows counterfactuals to be calculated at any time point for arbitrary treatment plans. Furthermore, we introduce a continuous-time observation process based on Hawkes processes. The controllable nature of the observation process allows us to simulate irregularly sampled observational data for a range of different observation process parameterizations, which are motivated by common healthcare scenarios.
5.1. Modeling tumor growth under general observation patterns
Tumor growth dynamics.
We use a well-established biomathematical PK-PD model for tumor growth in lung cancer patients that includes the effects of chemotherapy and radiotherapy (Geng et al., 2017). The PK-PD model is representative of the true underlying physiological process with responses to interventions. Hence, results using the model should be closely representative of reality. Additionally, the same underlying model was also used by Lim et al. (2018) and Bica et al. (2020b). We briefly describe it below and refer the reader to Appendix C for more details. The tumor volume at time t after diagnosis is modeled as follows:


Observation process.
As discussed in Section 1, in real-world clinical settings, patients are rarely observed at fixed, regular time intervals. Instead, they are observed irregularly, with observations often a consequence of clinical factors, e.g. severity of illness, treatment regimen, or medical policy.
To simulate such nuances, we augment the simulation environment by modeling the patient observation process with a Hawkes process (Hawkes, 1971; Hawkes & Oakes, 1974). A Hawkes process is a flexible point process with temporal dependencies. Indeed, since clinicians are not memoryless and the times when they make observations often depend on past observations, the Hawkes process appears a sensible parameterization of such an observation process. This is especially true since it captures time-varying sampling intensity that depends on both patient history and clinical state.
These properties have been leveraged by both Bao et al. (2017) and Alaa et al. (2017), who used a Hawkes process to model different types of observational healthcare data, further justifying its applicability to healthcare and its use in our simulations. In addition, the success of both methods applying Hawkes processes to EHR data (which is often high-dimensional data), underlines the applicability to highdimensional scenarios. From an experimental perspective, the Hawkes process is readily parameterized to simulate different clinical observation scenarios/regimes (i.e. a test bed to simulate different clinical scenarios).


Benchmarks.
We compare TE-CDE with state-of-the-art methods for counterfactual estimation over time CRN (Bica et al., 2020b) and RMSN (Lim et al., 2018) in different irregular sampling scenarios. Both CRN and RMSN rely on the assumption of regularly sampled data. Thus, for the irregular setting of interest, we divide the timeline equally, interpolate and impute the “un-observed” observations. We also evaluate a Gaussian Process (GP) based model for continuous-time, similar to Schulam & Saria (2017). However, the model performance is poor for larger values of time-dependent confounding, and the results are included together with additional experiments in Appendix F. For domain adversarial training, we use the standard procedure (Ganin et al., 2016), with an initial µ = 0 that follows an exponentially increasing schedule per epoch of training for the range [0, 1]. In addition, we assess the impact of the adversarial training procedure in TE-CDE by training a version of our method without domain adversarial training, i.e. constant µ = 0 in the loss function (Eq. 9).
Experimental details.
Implementation details, including hyper-parameters, can be found in Appendix E. Unless otherwise stated, each experiment is run with 10,000 patients for training, 1,000 for validation and 10,000 for testing.

5.2. Impact of time-dependent confounding across varying sampling intensities
A key challenge when learning from longitudinal data is accounting for bias introduced by time-dependent confounding. Therefore, it is essential to assess the impact of time-dependent confounding on counterfactual estimation. We measure performance via normalized RMSE, where the RMSE is normalized by the maximum tumor volume Vmax = 1150cm3. As discussed, time-dependent confounding is controlled by parameters γ_c, γ_r in the treatment assignment policies. We evaluate the benchmarks under increasing degrees of time-dependent confounding by setting γ_c = γ_r = γ = {2, 4, 6, 8, 10}.

As previously discussed, a patient’s condition often affects the frequency of observation. The state-based variability in sampling intensity based on the clinical state is controlled by κ in the simulation environment. We repeat the experiment for multiple values of the scaling factor κ = {1, 5, 10}.
Figure 3 shows the results for counterfactual estimation for different levels of sampling intensity κ. As expected, the performance of all models degrades with increasing time-dependent confounding. However, TE-CDE achieves the lowest counterfactual estimation RMSE for all values of sampling intensity κ and across all values of time-dependent confounding γ. The divergence is most pronounced for increasing γ, with γ = 10 leading to a 36% decrease in RMSE for TE-CDE compared to CRN, the next best performing method. The superior performance of TE-CDE in all settings highlights the benefit of the continuous-time approach adopted compared to RNN-based approaches.
Comparing the RNN-based models, CRN outperforms RMSN, matching the conclusions in the regularly sampled setting reported in Bica et al. (2020b). Overall, however, the results highlight the limitation of RNN-based models in the irregularly sampled setting.
Finally, we characterize the value of domain adversarial training in TE-CDE by comparing it to the case when µ = 0 (i.e. no domain adversarial training). TE-CDE (µ = 0) suffers significant performance degradation with a higher RMSE in all scenarios that grows as the degree of time-dependent confounding increases. This clearly demonstrates the practical benefit of the adversarial training approach to learning balancing representations.
5.3. Treatment-conditioned sampling
In the previous section, we consider the realistic scenario where the severity of the patients’ condition governs the intensity of the observation process, i.e. sicker patients with higher stages are observed more frequently. In addition, the treatment regimen itself will often also influence the observation pattern of the patient, i.e. patients undergoing different treatments are monitored differently.
To simulate this phenomenon, we adjust the base sampling intensity κ depending on whether the patient is treated or untreated, thereby altering the state-dependent sampling variability between the two treatment groups. We set κ = 10 for treated patients and κ = 1 for untreated patients. We fix the time-dependent confounding as γ = 4.
Consistent with the previous experiment, TE-CDE significantly outperforms the benchmark models (Table 2). There is a divergence in performance between the treated and untreated populations. This is largely explained by differences in the severity of the clinical condition between the two populations: the majority of untreated patients (77%) remain in cancer stage S1A (i.e. max st = 0). Due to the reduced state transition, these patients naturally have lower variability in tumor volume over the trajectory, leading to lower error for all methods. Overall, TE-CDE outperforms all methods globally and for treated and untreated patients.

5.4. Forecasting at additional time horizons
We have assessed the ability to estimate counterfactual outcomes at the subsequent observation time determined by the Hawkes process described in Section 5.1. To further validate our method, we assess counterfactual estimation at subsequent observation times that are further in the future.
As an illustrative example, we evaluate counterfactual estimation at time t_k+n, i.e. estimate tumor volume y_tk+n . Similar to other experiments, we vary the degree of time-dependent confounding γ = {2, 4, 6, 8, 10}, fix κ = 10, and set the forecasting horizon n = 5 (see Appendix F.8 for other time horizons). As expected, it is more challenging to estimate counterfactuals further in the future. As shown in Figure 4, similar trends are observed as the setting of Section 5.2 (Figure 3). TE-CDE outperforms both CRN and RMSN for all γ, with a greater performance differential as the degree time-dependent confounding increases. For γ = 10, there is a 40% reduction in RMSE for TE-CDE.

5.5. Treatment selection
To demonstrate how models such as TE-CDE could be used in decision support and the potential impact assisting clinical decision makers, we must assess performance in ways beyond counterfactual estimation. One such clinically relevant evaluation is whether the best treatment was selected. This is important since reduced error in counterfactual estimation does not necessarily result in improved clinical outcomes.
We define the “correct” treatment selection as the treatment that minimizes the tumor volume at time t_k+n (i.e. y_tk+n).
We adopt the same experimental setup as Section 5.4 and set the forecasting horizon n = 5, sampling intensity κ = 10, and vary time-dependent confounding γ = {2, 4, 6, 8, 10}.
Figure 5 shows decreasing accuracy for all methods as time-dependent confounding increases. However, for all values of γ, TE-CDE outperforms both CRN and RMSN, more frequently selecting the optimal treatment. Similar to the other experiments, as γ increases, the performance gap between TE-CDE and both CRN and RMSN increases, with c. 4% difference in absolute treatment selection accuracy at γ = 4 increasing to c. 10% at γ = 10. This experiment emphasizes that differences in counterfactual estimation result in meaningful differences in treatment selection accuracy.

5.6. Additional experiments
We perform a number of additional experiments to further validate TE-CDE. In Appendix F.1, we show how uncertainty estimates can be obtained from TE-CDE and then used to rank counterfactual estimates such that uncertain samples can be deferred to clinicians to improve outcomes. Appendix F.2 compares the data efficiency of the different methods, which is useful in clinical settings with limited labeled data. In Appendix F.3, we explore the latent representation of TE-CDE over time to: (1) highlight that the latent states zt learned by TE-CDE indeed are treatment-invariant representations and (2) investigate clinical insights that can be ascertained from the latent representations.
6. Conclusion
State-of-the-art methods for counterfactual estimation are predicated on the assumption of regular and evenly spaced data sampling. However, real-world clinical time series are often irregular. To address this challenge, we introduce TE-CDE, a model that learns to perform counterfactual estimation in continuous time from irregularly sampled observational data with time-dependent confounding. Additionally, we propose a controlled simulation environment for medically realistic irregularly sampled time series. In experiments in a variety of irregular settings, we demonstrate that TE-CDE provides improvements over current state-of-the-art methods.
Counterfactual estimation has the potential to assist clinicians with “what-if” decision-making. However, when deploying such models in healthcare settings, there are risks, e.g. inaccurate predictions. Furthermore, trade-offs between outcomes and possible treatment side effects are not accounted for by such models. To mitigate possible adverse effects, counterfactual estimates should be part of a “human-in-the-loop” paradigm, allowing experts to complement predictions with domain knowledge to improve patient outcomes.
We also note that while some aspects of irregularly sampled data are naturally addressed through our formulation, this work is simply a step in the right direction and our proposed solution only partially addresses the complexities of irregular sampling. In particular more is needed to address a number of aspects, such as informative sampling, that are widely prevalent in healthcare. We hope that this serves as a motivation for future work.
To be continued...
See you, soon!

'*NeuralDiffEqn > paper' 카테고리의 다른 글
Stable Neural SDEs in Analyzing Irregular Time Series Data (0) 2025.09.12 Latent ODEs for Irregularly-Sampled Time Series (0) 2025.09.08 (6) On Neural Differential Equations (0) 2025.08.09 Neural SDEs as Infinite-Dimensional GANs (0) 2025.08.07 Neural Controlled Differential Equations for Irregular Time Series (0) 2025.08.07