Regression models for survival data with applications in
insurance  R programming
Montserrat Guillén, Jens P. Nielsen & Anna M. PérezMarín
 Guillén, M., Nielsen, J.P., Scheike, T. and PérezMarín, A.M. (2012) "Timevarying effects in the analysis of customer loyalty: a case study in insurance" Expert Systems with Applications, 39, 35513558.
DATA DESCRIPTION
Name  Content description 
datat.csv

400 observations. Time: timetoevent
variable. Status: censoring indicator (0
censored, 1 event). x1, x2, x3, x4: covariates 
PROPORTIONAL HAZARDS REGRESSION MODEL
We define our dependent variable, T as the timetoevent variable. For example, if we are interested in analyzing customer lifetime duration, then T is the time elapsed since the policy is underwritten by a policy holder until it is canceled. This survival time may be rightcensored because the event of interest (policy cancellation) may not have occurred before the end of the observation period.
The classical proportional hazards regression model, also called Cox model (Cox, 1972) specifies the intensity λ_{i}(t) of the random variable T for individual i as:
\begin{equation*} \lambda _{i}(t)=Y_{i}(t)\alpha _{0}(t)\exp (Z_{i}^{\prime }\beta ) \end{equation*}where Y_{i}(t) is an indicator equal to 1 if the individual i is at risk at t and zero otherwise, α_{0}(t) is the socalled baseline hazard function which only depends on time, Z_{i}=(Z_{i1}, . . ., Z_{ip})' is the pdimensional vector of covariates and β is the pdimensional vector of unknown regression parameters. The regression parameters β are estimated by maximizing the Cox's partiallikelihood function (Cox, 1972, 1975):
\begin{equation*} L(\beta )=\prod_{t}\prod_{i}\left( \frac{\exp \left( Z_{i}^{\prime }\beta \right) }{S_{0}(t,\beta )}\right) ^{\Delta N_{i}(t)} \end{equation*}where \begin{equation*} S_{0}(t,\beta )=\sum_{i=1}^{n}Y_{i}(t)\exp (Z_{i}^{\prime }\beta ) \end{equation*} and N_{i}(t) is a counting process that is equal to zero until the moment when the event occurs for individual i. More details about regression models for survival data can be found in Martinussen and Scheike (2006).
The proportional hazards regression model can be easily estimated in R by using the coxph function of the survival R package. An example is shown in the following script.
COXTYPE REGRESSION MODEL WITH TIMEVARYING COEFFICIENTS
The following extension of the Cox model (Murphy and Sen, 1991 and Grambsch and Therneau, 1994) introduce timedependent parameters and also considers potential timedependent covariates: \begin{equation} \lambda _{i}(t)=Y_{i}(t)\exp \left[ Z_{i}^{\prime }(t)\beta (t) \right] \hspace{0.80cm} (1) \end{equation}
where Z_{i}(t)={Z_{i1}(t), . . ., Z_{ip}(t}' is the pdimensional vector of covariates that may evolve with time t and β(t) denotes the associated regression timedependent coefficients. When the first covariate is constant and equal to one Z_{i1}(t) = 1 then (1) contains a baseline α_{0}(t) that is parametrized as exp[β_{1}(t)]. It is important to remark the difference between timevarying covariates and timevarying parameters. Timevarying covariates are some individual characteristic which may vary over time (such as age) while timevarying parameters reflect the fact that some covariate (which can be either constant or not) may have a changing effect on the response over time. For instance, the effect of filing a claim on the risk of cancellation can be higher/lower shortly after the accident occurs than later on. This more general model therefore includes both formulation improvements (more details can be found in Martinussen and Scheike, 2006).
Statistical tests based on the asymptotic analysis of the cumulative regression functions of model (1) can be used to decide whether these effects are in fact timevarying or not (see Martinussen and Scheike, 2006). The model estimation requires maximizing of the corresponding loglikelihood function (see more details in Martinussen, Scheike and Skovgaard, 2002 and Scheike and Martinussen, 2004), that is given by \begin{equation*} \log L(\beta )=\sum_{i}\left\{ \int_{0}^{\infty }Z_{i}^{\prime }(t)\beta (t)dN_{i}(t)\int_{0}^{\infty }Y_{i}(t)\exp \left[ Z_{i}^{\prime }(t)\beta (t)\right] dt\right\}. \end{equation*}
The estimation process is complex and starts with an initial estimate of the parameters and then the NewtonRaphson algorithm is applied, but smoothing is needed to stabilize the solution that finally is given in terms of cumulative regression coefficients. The model can be estimated by using the function timecox of the timereg R package (Scheike, 2014). An example is showed in the following script.
REFERENCES
[1] Cox, D.R. (1972) Regression models and life
tables, Journal of the Royal Statistical Society B,
34, 187220.
[2] Cox, D.R. (1975) Partial likelihood, Biometrika,
62, 269276.
[3] Grambsch, P. M. and Therneu, T.M. (1994)
Proportional hazards tests and diagnostics based on
weighted residuals, Biometrika, 81, 515526.
[4] Martinussen, T. and Scheike, T. H. (2006) Dynamic
regression models for survival data, Springer Verlag,
NewYork.
[5] Murphy, S. A. and Sen, P. K. (1991)
Timedependent coefficients in a Coxtype regression
model, Stochastic Processes and Their Applications,
39, 153180.
[6] Scheike, T. H. (2014) Timereg R package.
Available at
http://cran.rproject.org/web/packages/timereg/index.html.