Generating Survival Estimates

Survival Predictions

First, make sure that you have generated the transition matrices.

Future state probabilites for any population of individuals from an initial time can be generated using survivorship_vector(). The most common application is likely to be generating future state probabilities for one individual, which can be obtained by specifying a radix with a 1 in the position of the state you want to predict from. At this point you can also generate a life expectancy for an individual starting in any state at any time from study entry using life_expectancy(), although it is possible to use censorLE() to obtain this directly from the data in wide format.

Note

It is implicitly assumed that the final state in your model is death, but this will also work for other outcomes that are the terminal state of a study. For example, if your study outcome is the age of reaching menopause then the life expectancy function will still give you a valid estimate.

METER.table.survivorship_vector(transmat, radix, initial_time, states)

Predict the expected proportion of a population being in a given state at each time point based on some initial time and number of individuals currently in each state. To get probabilities for one individual, simply proceed for a 1-individual population.

Parameters
  • transmat (list) – a list of numpy matrices that represent the transition probabilities at each time point. This is the first output from transitionprobs_and_samplesizes().

  • states (list) – the names of the states in the model

  • radix (numpy array) – an initial condition for the number of subjects in each state we want to model, specified as a numpy vector. For example, if we wanted the probabilities of being in each state for 1 individual starting at the first state in a 6-state model the radix would be generated by np.array([[1],[0],[0],[0],[0],[0]], dtype=float).

  • initial_time (int) – the initial time to model the survivorship outcomes from

Returns

a dataframe with the expected proportion of the population in each state at each time point past the initial time given

Return type

pandas dataframe

METER.table.life_expectancy(transmat, initial_state, initial_time, states)

Predict the life expectancy of an individual based on current state and age

Parameters
  • transmat (list) – a list of numpy matrices that represent the transition probabilities at each age. This is the first output from transitionprobs_and_samplesizes().

  • states (list) – the names of the states in the model.

  • initial_state (string) – the initial state to model the survivorship ourcomes from.

  • initial_age (int) – the initial time to model the survivorship outcomes from

Returns

the life expectancy of an individual given the parameters specified

Return type

float

Life Expectancies

For the methods in this section, it is not necessary to have obtained the transition matrices. However, make sure that you have used wide_format() to get the data in the correct format.

METER permits 4 different ways of characterizing an individual when computing life expectancies:

  • The initial state is the state that the individual starts in, and is required whenever you ask METER to compute life expectancy point estimates or confidence intervals.


  • The initial time is the time point that you are predicting from. METER always gives life expectancy from this point forwards, so if you want absolute life expectancy estimates from study entry you should add this on after you’ve obtained the estimates.


  • A censor state is a state that you want to restrict the individual from moving beyond. This is useful for applications where you want to assess the life expectancy effect of attaining a certain state (eg. an award or military rank) and want to prevent your estimates for the control group from being affected by potential future transitions to that state.


  • Finally, you can specify any arbitrary set of covariate conditions for subgroup analysis. For the time being these need to be categorical covariates, although this is on the list of improvements for future versions.

Point estimates of life expectancy can be obtained using censorLE() directly from the data in wide format. As mentioned above, the initial state is a mandatory input. Unless the other conditions mentioned are specified, METER will assume that you want estimates from study entry (time point 0), with no censoring, and with no covariate restrictions.

To obtain confidence intervals by non-parametric bootstrap, you can use bootstrapLE() to get a dataframe of bootstrap runs, and then use summary_results() to obtain the confidence intervals and point estimates. METER allows you to run the bootstrap for any arbitrary set of groups, each defined by different initial states, censor states, and covariate restrictions. The initial time must be constant over each of these groups, because METER also provides confidence intervals and point estimates for the difference between these groups, and those measures only make sense if each group begins at the same time point from study entry.

METER is fast. 1000 bootstraps on a dataset of 5000 individuals with less than 10 states in the model for less than 5 groups should take under 15 minutes. If you want a log of how far your bootstrap has progressed printed to the console set loud=TRUE, and you will know whether you have time to go get a coffee.

Note

If you are comparing life expectancies for a number of different groups, I highly recommend using the group_names input to bootstrapLE. By default the groups will be named based on the initial states and these names may not be unique, which could cause a great deal of confusion.

METER.table.censorLE(data, transition_names, states, initial_state, initial_time=0, censor_state='default', conditions=None)

Get the life expectancy of an individual based on initial state, initial time, and any censor states or covariate conditions.

Parameters
  • data (pandas dataframe) – the data in wide format as generated by wide_format()

  • transition_names (list) – a list of the names of the columns that contain the transition times

  • states (list) – the names of the states in the model

  • initial_state (string) – the initial state that you want to estimate life expectancy from

  • initial_time (int) – optional input if you want to estimate life expectancy after a given time (by default 0)

  • censor_state (string) – a particular state that you want to restrict movement beyond (by default none)

  • conditions (dictionary) – a set of conditions you want the group to be subject to (by default none). ex. {‘Smoking’: ‘Yes’, ‘Race’: ‘White’}

Returns

the life expectancy of an individual starting in the initial state given the conditions provided

Return type

float

METER.summaries.bootstrapLE(data, transition_names, states, initial_states, n=1000, initial_time=0, censor_states='default', group_names='default', conditions='default', loud=False)

Run a bootstrap on the life expectancy for a given set of groups

Parameters
  • data (pandas dataframe) – the data in wide format as generated by wide_format()

  • transition_names (list) – a list of the names of the columns that contain the transition times

  • states (list) – the names of the states in the model

  • initial_states (list) – a list of initial states to estimate from

  • n (int) – the number of bootstraps to run, by default 1000.

  • initial_time (int) – to estimate life expectancy after a given time (by default 0)

  • censor_states (list) – the states you want each group’s life expectancy to be censored at (by default no censoring) if provided this list must be the same length as initial_states

  • group_names (list) – what the groups (whose structure is defined both by the initial states and censor states given) are to be called. by default this is the same as the initial states. if provided this list must be the same length as initial_states

  • conditions (list) – a list of dictionaries of conditions you want each group to be subject to (by default none). ex. [{‘Race’: ‘White’, ‘Smoking’: ‘Yes’}, {‘Race’: ‘Black’, ‘Smoking’: ‘No’}]

  • loud (bool) – by default this is false. If it is set to true a small summary of the results of each bootstrap as well as the best estimate calculated initially are printed to the console.

Returns

a dataframe containing the results of each run of the bootstrap. Each row will include that bootstrap life expectancy for each group as well as each of the possible group differences.

Return type

pandas dataframe

METER.summaries.summary_results(bootstrap, confidence_level=0.95)

Summarize the results of a bootstrap.

Parameters
  • bootstrap (pandas dataframe) – the bootstrap dataframe generated by bootstrapLE()

  • confidence_level (float) – the confidence level which you want to generate confidence intervals for

Returns

A dataframe summarizing the point estimates and confidence intervals for each quantity.

Return type

pandas dataframe