XLSTAT-Survival Analysis

Tutorial
View tutorial
s
XLSTAT-Survival Analysis (formerly called XLSTAT-Life) is an Excel add-in which has been developed to provide XLSTAT users with a powerful solution for survival analysis. It is a crucial statistical instrument for survival analysis and life table analysis.

This analytical software solution provides you with leading-edge methods such as survival analysis using Kaplan-Meier analysis. Moreover it takes competing risks into account with cumulative incidence, regression with Cox proportional hazards, and the Nelson-Aalen method for estimating the hazard functions among other great functions. Data analysis of different populations has never been easier!

Features

You can find tutorials that explains how XLSTAT-Survival Analysis works here.

Demo version

A trial version of XLSTAT-Survival Analysis is included in the main XLSTAT download.

Prices and ordering

These analyses are included in the XLStat-Ecology, XLStat-Biomed and XLStat-Premium packages.

 

 

 

DETAILED DESCRIPTIONS

Life table analysis

Tutorial
View a tutorial

What is life table analysis

Life table analysis belongs to the descriptive methods of survival analysis, as well as Kaplan Meier analysis. The life table analysis method was developed first, but the Kaplan-Meier analysis method has been shown to be superior in many cases.

Life table analysis allows to quickly obtain a population survival curve and essential statistics such as the median survival time. Life table analysis, which main result is the life table (also called actuarial table) works on regular time intervals. This is the major difference with the Kaplan Meier analysis, where the time intervals are taken as they are in the data set.

Use of Life table analysis

Life table analysis allows you to analyze how a given population evolves with time. This technique is mostly applied to survival data and product quality data. There are three main reasons why a population of individuals or products may evolve:

  • some individuals die (products fail),
  • some other go out of the surveyed population because they get healed (repaired) or because their trace is lost (individuals move from location, the study is terminated, among other possible reasons).

The first type of data is usually called "failure data", or "event data", while the second is called "censored data".

The life table method allows comparing populations, through their survival curves. For example, it can be of interest to compare the survival times of two samples of the same product produced in two different locations. Tests can be performed to check if the survival curves have arisen from identical survival functions. These results can later be used to model the survival curves and to predict probabilities of failure.

Censoring data for life table analysis

Type of censoring for life table analysis

There are several types of censoring of survival data:

  • Left censoring: when an event is reported at time t=t(i), we know that the event occurred at t * t(i).
  • Right censoring: when an event is reported at time t=t(i), we know that the event occurred at t * t(i), if it ever occurred.
  • Interval censoring: when an event is reported at time t=t(i), we know that the event occurred during [t(i-1); t(i)].
  • Exact censoring: when an event is reported at time t=t(i), we know that the event occurred exactly at t=t(i).

Independent censoring for life table analysis

The life table method requires that the observations are independent. Second, the censoring must be independent: if you consider two random individuals in the study at time t-1, if one of the individuals is censored at time t, and if the other survives, then both must have equal chances to survive at time t.
There are four different types of independent censoring:

  • Simple type I: all individuals are censored at the same time or equivalently individuals are followed during a fixed time interval.
  • Progressive type I: all individuals are censored at the same date (for example, when the study terminates).
  • Type II: the study is continued until n events have been recorded.
  • Random: the time when a censoring occurs is independent of the survival time.

Results of the life table analysis in XLSTAT

Life table

The life table displays the various results obtained from the analysis, including:

  • Interval: Time interval.
  • At risk: Number of individuals that were at risk during the time interval.
  • Events: Number of events recorded during the time interval.
  • Censored: Number of censored data recorded during the time interval.
  • Effective at risk: Number of individuals that were at risk at the beginning of the interval minus half of the individuals who have been censored during the time interval.
  • Survival rate: Proportion of individuals who "survived" (the event did not occur) during the time interval. Ratio of individuals who survived over the individuals who were "effective at risk".
  • Conditional probability of failure: Ratio of individuals who failed over the individuals who were "effective at risk".
  • Standard error of the conditional probability.
  • Survival distribution function (SDF): Probability of an individual to survive until at least the time interval of interest. Also called survivor function.
  • Standard error of the survival function.
  • Probability density function: estimated density function at the midpoint of the interval.
  • Standard error of the probability density.
  • Hazard rate: estimated hazard rate function at the midpoint of the interval. Also called failure rate. Corresponds to the failure rate for the survivors.
  • Standard error of the hazard rate.
  • Median residual lifetime: Amount of time remaining to reduce the surviving population (individuals at risk) by one half. Also called median future lifetime.
  • Median residual lifetime standard error.

Charts for life table analysis

XLSTAT offers the following charts:

  • Survival distribution function (SDF)
  • -Log(SDF) corresponding to the –Log() of the survival distribution function (SDF).
  • Log(-Log(SDF)) corresponding to the Log(–Log()) of the survival distribution function.

It is also possible to identify on the charts the times when censored data have been recorded.

Test of equality of the survival functions

It is possible to compute a test of equality of th esurvival functions with three different tests:

  • the Log-rank test,
  • the Wilcoxon test,
  • and the Tarone Ware test.

These tests are based on a Chi-square test. The lower the corresponding p-value, the more significant the differences between the groups.

 

 

Kaplan-Meier analysis

Tutorial
View a tutorial

What is Kaplan-Meier analysis

The Kaplan-Meier method, also called product-limit analysis, belongs to the descriptive methods of survival analysis, as well as life table analysis. The life table analysis method was developed first, but the Kaplan-Meier method has been shown to be superior in many cases.

Kaplan-Meier analysis allows you to quickly obtain a population survival curve and essential statistics such as the median survival time. Kaplan-Meier analysis, which main result is the Kaplan-Meier table, is based on irregular time intervals, contrary to the life table analysis, where the time intervals are regular.

Use of Kaplan-Meier analysis

Kaplan-Meier analysis is used to analyze how a given population evolves with time. This technique is mostly applied to survival data and product quality data. There are three main reasons why a population of individuals or products may evolve: some individuals die (products fail), some other go out of the surveyed population because they get healed (repaired) or because their trace is lost (individuals move from location, the study is terminated, among other reasons). The first type of data is usually called "failure data", or "event data", while the second is called "censored data".

The Kaplan-Meier analysis allows you to compare populations, through their survival curves. For example, it can be of interest to compare the survival times of two samples of the same product produced in two different locations. Tests can be performed to check if the survival curves have arisen from identical survival functions. These results can later be used to model the survival curves and to predict probabilities of failure.

Censoring data for Kaplan-Meier analysis

Types of censoring for Kaplan-Meier analysis

There are several types of censoring of survival data:

  • Left censoring: when an event is reported at time t=t(i), we know that the event occurred at t * t(i).
  • Right censoring: when an event is reported at time t=t(i), we know that the event occurred at t * t(i), if it ever occurred.
  • Interval censoring: when an event is reported at time t=t(i), we know that the event occurred during [t(i-1); t(i)].
  • Exact censoring: when an event is reported at time t=t(i), we know that the event occurred exactly at t=t(i).

Independent censoring for Kaplan-Meier analysis

The Kaplan-Meier method requires that the observations are independent. Second, the censoring must be independent: if you consider two random individuals in the study at time t-1, if one of the individuals is censored at time t, and if the other survives, then both must have equal chances to survive at time t. There are four different types of independent censoring:

  • Simple type I: all individuals are censored at the same time or equivalently individuals are followed during a fixed time interval.
  • Progressive type I: all individuals are censored at the same date (for example, when the study terminates).
  • Type II: the study is continued until n events have been recorded.
  • Random: the time when a censoring occurs is independent of the survival time.

Results for the Kaplan-Meier analysis in XLSTAT

Kaplan-Meier table

This table displays the various results obtained from the analysis, including:

  • Interval start lime: lower bound of the time interval.
  • At risk: number of individuals that were at risk.
  • Events: number of events recorded.
  • Censored: number of censored data recorded.
  • Proportion failed: proportion of individuals who "failed" (the event did occur).
  • Survival rate: proportion of individuals who "survived" (the event did not occur).
  • Survival distribution function (SDF): Probability of an individual to survive until at least the time of interest. Also called cumulative survival distribution function, or survival curve.
  • Survival distribution function standard error.
  • Survival distribution function confidence interval.

Mean and Median residual lifetime

Mean and Median residual lifetime are computed and displayed into two tables.

  1. A first table displays the mean residual lifetime, the standard error, and a confidence range.
  2. A second table displays statistics (estimator, and confidence range) for the 3 quartiles including the median residual lifetime (50%). The median residual lifetime is one of the key results of the Kaplan-Meier analysis as it allows to evaluate the time remaining for half of the population to "fail".

Confidence interval for the Kaplan-Meier analysis function

Computing confidence intervals for the survival function can be done using three different methods:

  1. Greenwood’s method
  2. Exponential Greenwood’s method
  3. Log-transformed method

These three approaches give similar results, but the last ones will be preferred when samples are small.

Charts for Kaplan-Meier analysis

XLSTAT offers the following charts:

  • Survival distribution function (SDF)
  • -Log(SDF) corresponding to the –Log() of the survival distribution function (SDF).
  • Log(-Log(SDF)) corresponding to the Log(–Log()) of the survival distribution function.

It is also possible to identify on the charts the times when censored data have been recorded.

Test of equality of the survival functions

It is possible to compute a test of equality of th esurvival functions with three different tests:

  • the Log-rank test,
  • the Wilcoxon test,
  • and the Tarone Ware test.

These tests are based on a Chi-square test. The lower the corresponding p-value, the more significant the differences between the groups.

 

 

Cox proportional hazards models

Tutorial
View a tutorial

Principle of Cox proportional hazards model

The principle of the Cox proportional hazards model is to link the survival time of an individual to covariates. For example, in the medical domain, we are seeking to find out which covariate has the most important impact on the survival time of a patient.

Cox Models

A Cox model is a well-recognized statistical technique for exploring the relationship between the survival of a patient and several explanatory variables. A Cox model provides an estimate of the treatment effect on survival after adjustment for other explanatory variables. It allows us to estimate the hazard (or risk) of death, or other event of interest, for individuals, given their prognostic variables.

Interpreting a Cox model involves examining the coefficients for each explanatory variable. A positive regression coefficient for an explanatory variable means that the hazard for patient having a high positive value on that particular variable is high. Conversely, a negative regression coefficient implies a better prognosis for patients with higher values of that variable.

Cox’s method does not assume any particular distribution for the survival times, but it rather assumes that the effects of the different variables on survival are constant over time and are additive in a particular scale.

The hazard function is the probability that an individual will experience an event (for example, death) within a small time interval, given that the individual has survived up to the beginning of the interval. It can therefore be interpreted as the risk of dying at time t. The hazard function (denoted by λ(t,X)) can be estimated using the following equation:

λ(t,X) = λ0(t) exp(βX)

The first term depends only on time and the second one depends on X. We are only interested on the second term.
If we only estimate the second term, a very important hypothesis has to be verified: the proportional hazards hypothesis. It means that the hazard ratio between two different observations does not depend on time.
Cox developed a modification of the likelihood function called partial likelihood to estimate the coefficients β not taking into account the time dependent term of the hazard function:

log[L(β)] = Σi=1..n βXi - log[Σj=t(j)≥ t(i) exp(βXj)]

To estimate the β parameters of the model (the coefficients of the linear function), we try to maximize the partial likelihood function. Contrary to linear regression, an exact analytical solution does not exist. So an iterative algorithm has to be used. XLSTAT uses a Newton-Raphson algorithm. The user can change the maximum number of iterations and the convergence threshold if desired.

Strata in the Cox proportional hazards model

When the proportional hazards hypothesis does not hold, the model can be stratified. If the hypothesis holds on sub-samples, then the partial likelihood is estimated on each sub-sample and these partial likelihoods are summed in order to obtain the estimated partial likelihood. In XLSTAT, strata are defined using a qualitative variable.

Qualitative variables in the Cox proportional hazards model

Qualitative covariates are treated using a complete disjunctive table. In order to have independent variables in the model, the binary variable associated to the first modality of each qualitative variable has to be removed from the model. In XLSTAT, the first modality is always selected and, thus, its effect corresponds to a standard. The impacts of the other modalities are obtained relatively to the omitted modality.

Ties handling for Cox proportional hazards model

The proportional hazards model has been developed by Cox (1972) in order to treat continuous time survival data. However, frequently in practical applications, some observations occur at the same time. The classical partial likelihood cannot be applied. With XLSTAT, you can use two alternative approaches in order to handle ties:

  • Breslow’s method (1974) (default method): The partial likelihood has the following form:

    log[L(β)] = Σi=1..T β Σl=1..diXl - di log[Σj=t(j)≥ t(i) exp(βXj)]

    where T is the number of times and di is the number of observations associated to time t(i).
  • Efron’s method (1977): The partial likelihood has the following form:

    log[L(β)] = Σi=1..T β Σl=1..diXl - Σr=0..di-1 log [Σj=t(j)≥ t(i) exp(βXj) – r/di Σj=1..di exp(βXj)],

    where T is the number of times and i is the number of observations associated to time t(i).

If there are no ties, partial likelihoods are equivalent to Cox partial likelihood.

Variables selection for the Cox proportional hazard model

It is possible to improve the Cox proportional hazards model by selecting the variables being part of the model. XLSTAT offers two options to select the variables:

  • Forward selection: The selection process starts by adding the variable with the largest contribution to the model. If a second variable is such that its entry probability is greater than the entry threshold value, then it is added to the model. This process is iterated until no new variable can be entered in the model.
  • Backward selection: This method is similar to the previous one but starts from a complete model.

Results for the Cox proportional hazard in XLSTAT

Goodness of fit coefficients for the Cox proportional hazard model

The goodness of fit coefficients table displays a series of statistics for the independent model (corresponding to the case where there is no impact of covariates, beta=0) and for the adjusted model.

  • Observations: The total number of observations taken into;
  • DF: Degrees of freedom;
  • -2 Log(Like.): The logarithm of the likelihood function associated with the model;
  • AIC: Akaike’s Information Criterion;
  • SBC: Schwarz’s Bayesian Criterion;
  • Iterations: Number of iterations until convergence.

Statistical test of the Cox proportional hazard model

XLSTAT enables you to test the null hypothesis H0: beta=0:

The H0 hypothesis corresponds to the independent model (no impact of the covariates). We seek to check if the adjusted model is significantly more powerful than this model. Three tests are available: the likelihood ratio test (-2 Log(Like.)), the Score test and the Wald test. The three statistics follow a Chi2 distribution whose degrees of freedom are shown.

Model parameters

The parameter estimate, corresponding standard deviation, Wald's Chi², the corresponding p-value and the confidence interval are displayed for each variable of the model. The hazard ratios for each variable with confidence intervals are also displayed.

The residual table shows, for each observation, the time variable, the censoring variable and the value of the residuals (deviance, martingale, Schoenfeld and score).

Available charts for the Cox proportional hazard model

XLSTAT offers the following charts for the Cox proportional hazards model:

  • Cumulative Survival distribution function (SDF),
  • -Log(SDF),
  • Log(-Log(SDF)),
  • hazard function at mean of covariates,
  • residuals.

 

 

Sensitivity and specificity analysis

Tutorial
View a tutorial

What is sensitivity and specificity analysis

Sensitivity and Specificity function allows computing, among others, the sensitivity, specificity, odds ratio, predictive values, and likelihood ratios associated with a test or a detection method. These indices can be used to assess the performance of a test.

In medicine it can be used to evaluate the efficiency of a test used to diagnose a disease or in quality control to detect the presence of a defect in a manufactured product.

Method History

This method was first developed during World War II to develop effective means of detecting Japanese aircrafts. It was then applied more generally to signal detection and medicine where it is now widely used.

Principles of Sensitivity and Specificity method

We study a phenomenon, often binary (for example, the presence or absence of a disease) and we want to develop a test to detect effectively the occurrence of a precise event (for example, the presence of the disease).

Let V be the binary or multinomial variable that describes the phenomenon for N individuals that are being followed. We note by + the individuals for which the event occurs and by ‘-those for which it does not. Let T be a test which goal is to detect if the event occurred or not. T can be a binary (presence/absence), a qualitative (for example the color), or a quantitative variable (for example a concentration). For binary or qualitative variables, let t1 be the category corresponding to the occurrence of the event of interest. For a quantitative variable, let t1 be the threshold value under or above which the event is assumed to happen.

Once the test has been applied to the N individuals, we obtain an individual/variable table in which for each individual you find if the event occurred or not, and the result of the test.

Individual Disease Binary Test Quantitative Test
I01 + + 0
I02 + + 0.1
I03 + + 0.2
I04 + + 0.3
I05 + + 0.4
I06 + + 0.5
I07 - - 1
I08 + - 2
I09 - - 3
I10 - - 4
I11 - - 5

These tables can be summarized in a 2x2 contingency table:

Test/Disease D+ D-
T+ 6 0
T- 1 4

In the example above, there are 6 individuals for whom the test has detected the presence of the disease and 4 for which it has detected its absence. However, for 1 individual, diagnosis is bad because the test contends the absence of the disease while the patient is sick.

The following vocabulary is being used:

  • True positive (TP): Number of cases that the test declares positive and that are truly positive.
  • False positive (FP): Number of cases that the test declares positive and that in reality are negative.
  • True negative (VN): Number of cases that the test declares negative and that are truly negative.
  • False negative (FN): Number of cases that the test declares negative and that in reality are positive.

Indices for Sensitivity and Specificity analysis

Several indices are available to evaluate the performance of a test:

  • Sensitivity (equivalent to the True Positive Rate): Proportion of positive cases that are well detected by the test. In other words, the sensitivity measures how the test is effective when used on positive individuals. The test is perfect for positive individuals when sensitivity is 1, equivalent to a random draw when sensitivity is 0.5. If it is below 0.5, the test is counter-performing and it would be useful to reverse the rule so that sensitivity is higher than 0.5 (provided that this does not affect the specificity). The mathematical definition is given by: Sensitivity = TP/(TP + FN).
  • Specificity (also called True Negative Rate): proportion of negative cases that are well detected by the test. In other words, specificity measures how the test is effective when used on negative individuals. The test is perfect for negative individuals when the specificity is 1, equivalent to a random draw when the specificity is 0.5. If it is below 0.5, the test is counter performing-and it would be useful to reverse the rule so that specificity is higher than 0.5 (provided that this does not affect the sensitivity). The mathematical definition is given by: Specificity = TN/(TN + FP).
  • False Positive Rate (FPR): Proportion of negative cases that the test detects as positive (FPR = 1-Specificity).
  • False Negative Rate (FNR): Proportion of positive cases that the test detects as negative (FNR = 1-Sensibility)
  • Prevalence: relative frequency of the event of interest in the total sample (TP+FN)/N.
  • Positive Predictive Value (PPV): Proportion of truly positive cases among the positive cases detected by the test. We have PPV = TP / (TP + FP), or PPV = Sensitivity x Prevalence / [(Sensitivity x Prevalence + (1-Specificity)(1-Prevalence)]. It is a fundamental value that depends on the prevalence, an index that is independent of the quality of the test.
  • Negative Predictive Value (NPV): Proportion of truly negative cases among the negative cases detected by the test. We have NPV = TN / (TN + FN), or PPV = Specificity x (1 - Prevalence) / [(Specificity (1-Prevalence) + (1-Sensibility) x Prevalence]. This index depends also on the prevalence that is independent of the quality of the test.
  • Positive Likelihood Ratio (LR+): This ratio indicates to which point an individual has more chances to be positive in reality when the test is telling it is positive. We have LR+ = Sensitivity / (1-Specificity). The LR+ is a positive or null value.
  • Negative Likelihood Ratio (LR-): This ratio indicates to which point an individual has more chances to be negative in reality when the test is telling it is positive. We have LR- = (1-Sensitivity) / (Specificity). The LR- is a positive or null value.
  • Odds ratio: The odds ratio indicates how much an individual is more likely to be positive if the test is positive, compared to cases where the test is negative. For example, an odds ratio of 2 means that the chance that the positive event occurs is twice higher if the test is positive than if it is negative. The odds ratio is a positive or null value. We have Odds ratio = TPxTN / (FPxFN).
  • Relative risk: The relative risk is a ratio that measures how better the test behaves when it is a positive report than when it is negative. For example, a relative risk of 2 means that the test is twice more powerful when it is positive that when it is negative. A value close to 1 corresponds to a case of independence between the rows and columns, and to a test that performs as well when it is positive as when it is negative. Relative risk is a null or positive value given by: Relative risk = TP/(TP+FP) / (FN/(FN+TN)).

Confidence intervals for Sensitivity and Specificity analysis

For the various presented above, several methods of calculating their variance and, therefore their confidence intervals, have been proposed. There are two families: the first concerns proportions, such as sensitivity and specificity, and the second ratios, such as LR +, LR- the odds ratio and the relative risk.

For proportions, XLSTAT allows you to use the simple (Wald, 1939) or adjusted (Agresti and Coull, 1998) Wald intervals, a calculation based on the Wilson score (Wilson, 1927), possibly with a correction of continuity, or the Clopper-Pearson (1934) intervals. Agresti and Caffo recommend using the adjusted Wald interval or the Wilson score intervals.

For ratios, the variances are calculated using a single method, with or without correction of continuity.

Once the variance of the above statistics is calculated, we assume their asymptotic normality (or of their logarithm for ratios) to determine the corresponding confidence intervals. Many of the statistics are proportions and should lie between 0 and 1. If the intervals fall partly outside these limits, XLSTAT automatically corrects the bounds of the interval.

 

 

ROC Curves

Tutorial
View a tutorial
The ROC curve generated by XLSTAT-Survival Analysis software allows to represent the evolution of the proportion of true positive cases (also called sensitivity) as a function of the proportion of false positives cases (corresponding to 1 minus specificity), and to evaluate a binary classifier such as a test to diagnose a disease, or to control the presence of defects on a manufactured product.

ROC curve definition

The ROC curve corresponds to the graphical representation of the couple (1 – specificity, sensitivity) for the various possible threshold values.

Here are some important definitions:

  • Sensitivity (equivalent to the True Positive Rate): Proportion of positive cases that are well detected by the test. In other words, the sensitivity measures how the test is effective when used on positive individuals. The test is perfect for positive individuals when sensitivity is 1, equivalent to a random draw when sensitivity is 0.5. If it is below 0.5, the test is counter-performing and it would be useful to reverse the rule so that sensitivity is higher than 0.5 (provided that this does not affect the specificity). The mathematical definition is given by: Sensitivity = TP/(TP + FN).
  • Specificity (also called True Negative Rate): proportion of negative cases that are well detected by the test. In other words, specificity measures how the test is effective when used on negative individuals. The test is perfect for negative individuals when the specificity is 1, equivalent to a random draw when the specificity is 0.5. If it is below 0.5, the test is counter performing-and it would be useful to reverse the rule so that specificity is higher than 0.5 (provided that this does not affect the sensitivity). The mathematical definition is given by: Specificity = TN/(TN + FP).

Area Under the Curve

The area under the curve (AUC) is a synthetic index calculated for ROC curves. The AUC is the probability that a positive event is classified as positive by the test given all possible values of the test. For an ideal model we have AUC = 1 (above in blue), where for a random pattern we have AUC = 0.5 (above in red). One usually considers that the model is good when the value of the AUC is higher than 0.7. A well discriminating model should have an AUC between 0.87 and 0.9. A model with an AUC above 0.9 is excellent.

Sen (1960), Bamber (1975) and Hanley and McNeil (1982) have proposed different methods to calculate the variance of the AUC. All are available in XLSTAT. XLSTAT offers as well a comparison test of the AUC to 0.5, the value 0.5 corresponding to a random classifier. This test is based on the difference between the AUC and 0.5 divided by the variance calculated according to one of the three proposed methods. The statistic obtained is supposed to follow a standard normal distribution, which allows the calculation of the p-value.

The AUC can also be used to compare different tests between them. If the different tests have been applied to different groups of individuals, samples are independent. In this case, XLSTAT uses a Student test to compare the AUCs (which requires assuming the normality of the AUC, which is acceptable if the samples are not too small). If different tests were applied to the same individuals, the samples are paired. In this case, XLSTAT calculates the covariance matrix of the AUCs as described by Delong and Delong (1988) on the basis of Sen’s work (1960), to then calculate the variance of the difference between two AUCs, and to calculate the p-value assuming the normality.

XLSTAT results for the ROC analysis

In addition to the ROC and AUC curve, other results are computed.

ROC analysis

The ROC analysis table displays for each possible threshold value of the test variable, the various indices presented in the description section. On the line below the table you'll find a reminder of the rule set out in the dialog box to identify positive cases compared to the threshold value. Below the table you will find a stacked bars chart showing the evolution of the TP, TN, FP, FN depending on the value of the threshold value. If the corresponding option was activated, the decision plot is then displayed (for example, changes in the cost depending on the threshold value).

Comparison of the AUC to 0.5

These results allow to compare the test to a random classifier. The confidence interval corresponds to the difference. Various statistics are then displayed including the p-value, followed by the interpretation of the comparison test.

Comparison of the AUCs

If you selected several test variables, once the above results are displayed for each variable, you will find the covariance matrix of the AUC, followed by the table of differences for each pair of AUCs with as comments the confidence interval, and then the table of the p-values. Values in bold correspond to significant differences. Last, a graph that compares the ROC curves displayed.

 

Nelson-Aalen Analysis

Tutorial
View a tutorial

What is the Nelson-Aalen analysis

The Nelson-Aalen analysis method belongs to the descriptive methods for survival analysis such as life table analysis and Kaplan-Meier analysis.

The Nelson-Aalen approach can quickly give you a curve of cumulative hazard and estimate the hazard functions based on irregular time intervals.

Nelson-Aalen analysis is used to analyze how a given population evolves with time. This technique is mostly applied to survival data and product quality data. There are three main reasons why a population of individuals or products may evolve: some individuals die (products fail), some other go out of the surveyed population because they get healed (repaired) or because their trace is lost (individuals move from location, the study is terminated, among other causes) The first type of data is usually called "failure data" or "event data", while the second is called "censored data".

Censoring of survival data for the Nelson-Aalen analysis

Censoring types for the Nelson-Aalen analysis

There are several types of censoring of survival data:

  • Left censoring: when an event is reported at time t=t(i), we know that the event occurred at t * t(i).
  • Right censoring: when an event is reported at time t=t(i), we know that the event occurred at t * t(i), if it ever occurred.
  • Interval censoring: when an event is reported at time t=t(i), we know that the event occurred during [t(i-1); t(i)].
  • Exact censoring: when an event is reported at time t=t(i), we know that the event occurred exactly at t=t(i).

Independent censoring for the Nelson-Aalen method

The Nelson-Aalen method requires that the observations are independent. Second, the censoring must be independent – if you consider two random individuals in the study at time t-1, and one of the individuals is censored at time t, while the other survives, then both must have equal chances to survive at time t.

There are four different types of independent censoring:

  • Simple type I: all individuals are censored at the same time or equivalently individuals are followed during a fixed time interval.
  • Progressive type I: all individuals are censored at the same date (for example, when the study terminates).
  • Type II: the study is continued until n events have been recorded.
  • Random: the time when a censoring occurs is independent of the survival time.

Nelson-Aalen method and the cumulative hazard function

The Nelson-Aalen analysis allows comparing populations, through their hazards curves. Nelson-Aalen estimator should be preferred to Kaplan-Meier estimator when analyzing cumulative hazard functions. When analyzing cumulative survival functions, Kaplan-Meier estimator should be preferred.

The cumulative hazard function is:

H(T) = ΣTi≤T di/ri

with di being the number of observation falling at time Ti and ri, the number of observation at risk (still in the study) at time Ti.

Variances for the cumulative hazard function

Several different variance estimators are available:

  • Simple
  • Plug-in
  • Binomial

Confidence interval for the cumulative hazard function

Confidence intervals can also be obtained:

  • Greenwood’s method
  • Log-transformed method

The second one will be preferred with small samples.

XLSTAT results for the Nelson-Aalen analysis

Nelson-Aalen table

This table displays the various results obtained from the analysis, including:

  • Interval start lime: lower bound of the time interval.
  • At risk: number of individuals that were at risk.
  • Events: number of events recorded.
  • Censored: number of censored data recorded.
  • Cumulative hazard function: hazard associated with an individual at the considered time.
  • Cumulative hazard function error
  • Cumulative hazard function confidence interval
  • Survival distribution function: probability for an individual to survive until the considered time.

Charts for the Nelson-Aalen analysis

Depending on the selected options, up to three charts are displayed:

  • Cumulative hazard function,
  • survival distribution function,
  • and Log(Hazard function).

Test of equality of the survival functions

This table displays the statistics for three different tests:

  • the Log-rank test,
  • the Wilcoxon test,
  • and the Tarone Ware test.

These tests are based on a Chi-square test. The lower the corresponding p-value, the more significant the differences between the groups is.

 

 

Cumulative incidence

Tutorial
View a tutorial

What is cumulative incidence

The cumulative incidence allows estimating the impact when several competitive events may occur. It is usually called competing risks case. The time intervals should not necessarily be regular. XLSTAT allows the treatment of censored data in competing risks and to compare different groups within the population.

For a given period, the cumulative incidence is the probability that an observation still included in the analysis at the beginning of this period will be affected by an event during the period. It is especially appropriate in the case of competing risks, that is to say, when several types of events may occur.

This technique is used for the analysis of survival data, whether individuals (cancer research, for example) or products (resistance time of a production tool, for example): some individuals die (in this case we will have 2 causes of death: from the disease or an other cause), the products break (in this case we can model different breaking points), but others leave the study because they heal, you lose track of them (moving for example) or because the study was discontinued. The first type of data is usually called "failure data", or "event data", while the second is called "censored data".

Censoring of data for cumulative incidence

There are several types of censoring of survival data:

  • Left censoring: when an event is reported at time t=t(i), we know that the event occurred at t * t(i).
  • Right censoring: when an event is reported at time t=t(i), we know that the event occurred at t * t(i), if it ever occurred.
  • Interval censoring: when an event is reported at time t=t(i), we know that the event occurred during [t(i-1); t(i)].
  • Exact censoring: when an event is reported at time t=t(i), we know that the event occurred exactly at t=t(i).

The cumulative incidence method requires that the observations are independent. Second, the censoring must be independent: if you consider two random individuals in the study at time t-1, if one of the individuals is censored at time t, and if the other survives, then both must have equal chances to survive at time t. There are four different types of independent censoring:

  • Simple type I: all individuals are censored at the same time or equivalently individuals are followed during a fixed time interval.
  • Progressive type I: all individuals are censored at the same date (for example, when the study terminates).
  • Type II: the study is continued until n events have been recorded.
  • Random: the time when a censoring occurs is independent of the survival time.

When working with competing risks, the different types of events can happen only once, after the event has occurred, the observation is withdrawn from the analysis. We can calculate the risk of occurrence of an event in the presence of competitive events. XLSTAT allows you to compare the types of events but also to take account of groups of observations (depending on the treatment administered, for example).

Results for the cumulative incidence in XLSTAT

Cumulative incidence table

This table displays the various results obtained from the analysis, including:

  • Interval start lime: lower bound of the time interval.
  • At risk: number of individuals that were at risk.
  • Events i: number of events of type i recorded.
  • All types of events: number of events of all types recorded.
  • Censored: number of censored data recorded.
  • Cumulative incidence: Cumulative incidence obtained for event I at the considered time.
  • Cumulative incidence standard error.
  • Cumulative incidence confidence interval.

Cumulative Survival function

The Cumulative Survival function table displays the various results obtained from the analysis, including:

  • Interval start lime: lower bound of the time interval.
  • At risk: number of individuals that were at risk.
  • Events i: number of events of type i recorded.
  • All types of events: number of events of all types recorded.
  • Censored: number of censored data recorded.
  • Cumulative survival function: Cumulative survival function obtained for event i at the considered time.
  • Cumulative survival function standard error.
  • Cumulative survival function confidence interval.

Charts for cumulative incidence

XLSTAT offers two charts for this method:

  • Cumulative incidence
  • and cumulative survival function.

 

Parametric survival regression (Weibull model)

Tutorial
View a tutorial

Principle of parametric survival model

The principle of the parametric survival regression is to link the survival time of an individual to covariates using a specified probability distribution (generally the Weibull distribution). For example, in the medical domain, we are seeking to find out which covariate has the most important impact on the survival time of a patient.

Parametric survival models or Weibull models

A parametric survival model is a well-recognized statistical technique for exploring the relationship between the survival of a patient, a parametric distribution and several explanatory variables. It allows us to estimate the parameters of the distribution.

Variables selection for the parametric survival regression

It is possible to improve the parametric survival model by selecting the variables being part of the model. XLSTAT offers two options to select the variables:

  • Forward selection: The selection process starts by adding the variable with the largest contribution to the model. If a second variable is such that its entry probability is greater than the entry threshold value, then it is added to the model. This process is iterated until no new variable can be entered in the model.
  • Backward selection: This method is similar to the previous one but starts from a complete model.

Results for the parametric survival model in XLSTAT

Goodness of fit coefficients for the parametric survival regression

The goodness of fit coefficients table displays a series of statistics for the independent model (corresponding to the case where there is no impact of covariates, beta=0) and for the adjusted model.

  • Observations: The total number of observations taken into;
  • DF: Degrees of freedom;
  • -2 Log(Like.): The logarithm of the likelihood function associated with the model;
  • AIC: Akaike’s Information Criterion;
  • SBC: Schwarz’s Bayesian Criterion;
  • Iterations: Number of iterations until convergence.

Model parameters

The parameter estimate, corresponding standard deviation, Wald's Chi², the corresponding p-value and the confidence interval are displayed for each variable of the model.

The predictions and residuals table shows, for each observation, the time variable, the censoring variable, the value of the residuals, the estimated cumulative survival distribution, the empirical cumulative distribution function and the hazard function.

Available charts for the parametric survival regression

XLSTAT offers the following charts for the parametric survival regression:

  • Cumulative Survival distribution function (SDF),
  • -Log(SDF),
  • Log(-Log(SDF)),
  • hazard function,
  • residuals.

On each chart, the empirical and theoretical distribution function is displayed.

 

Parametric survival curves

Tutorial
View a tutorial

What is Parametric survival curve analysis

The Parametric survival curve belongs to the descriptive methods of survival analysis, as does life table analysis.

Parametric survival curves allows you to quickly obtain a population survival curve and essential statistics such as the median survival time based on a parametric distribution.

The Parametric survival curves is an alternative to Kaplan-Meier analysis when a distribution of the failure time can be supposed.

Use of Parametric survival curves

Parametric survival curve is used to analyze how a given population evolves with time. This technique is mostly applied to survival data and product quality data. There are three main reasons why a population of individuals or products may evolve: some individuals die (products fail), some other go out of the surveyed population because they get healed (repaired) or because their trace is lost (individuals move from location, the study is terminated, among other reasons). The first type of data is usually called "failure data", or "event data", while the second is called "censored data".

Results for the Parametric survival curve in XLSTAT

Parametric survival curves tables

 The parameters of the chosen distribution (generally the Weibull distribution) are displayed together with standard error, p-values and confidence intervals.

The quantiles associated to the distribution and to the data are also displayed.

Charts for Parametric survival curves

XLSTAT offers the following charts:

  • Survival distribution function (SDF)
  • -Log(SDF) corresponding to the –Log() of the survival distribution function (SDF).
  • Log(-Log(SDF)) corresponding to the Log(–Log()) of the survival distribution function.
  • Hazard function.

About KCS

Kovach Computing Services (KCS) was founded in 1993 by Dr. Warren Kovach. The company specializes in the development and marketing of inexpensive and easy-to-use statistical software for scientists, as well as in data analysis consulting.

Mailing list Join our mailing list

Home | Order | MVSP | Oriana | XLStat
QDA Miner | Accent Composer | Stats Books
Stats Links | Anglesey

Share: FacebookFacebook TwitterTwitter RedditReddit
Del.icio.usDel.icio.us Stumble UponStumble Upon

 

Like us on Facebook

Get in Touch

  • Email:
    sales@kovcomp.com
  • Address:
    85 Nant y Felin
    Pentraeth, Isle of Anglesey
    LL75 8UY
    United Kingdom
  • Phone:
    (UK): 01248-450414
    (Intl.): +44-1248-450414
  • Fax:
    (UK): 020-8020-0287
    (Intl.): +44-20-8020-0287