Understanding Statistical Methods for Clinical Endpoints in Diabetes Research

Slide Note
Embed
Share

This educational slide module delves into fundamental statistics for analyzing clinical endpoints in diabetes research. It covers the choice of statistical methods, the distinction between statistical and clinical significance, and the importance of different endpoints in evaluating clinical benefits like glycemic control and safety. The module also explains time-to-event endpoints in diabetes studies with examples. The content provides guidance on selecting appropriate statistical methods based on the type of endpoint being analyzed.


Uploaded on Jul 19, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Fundamental statistics EDUCATIONAL SLIDE MODULE Date of preparation: February 2023 Version 1.0 SC-CRP-08993

  2. Acknowledgements Content developed by Helen Barraclough, Eli Lilly Australia and Associate Professor Wendy Davis, University of Western Australia The authors acknowledge H l ne Sapin, Michaela Mattheus, Stefan Hantel and the ACROSS T2D Steering Committee for reviewing the slides 2

  3. Fundamental statistics Choice of statistical method Choice of statistical method Commonly used statistics Commonly used statistics Statistical significance vs clinical significance Statistical significance vs clinical significance Analysis populations Analysis populations 3

  4. How do I know which method to use? The type of method depends on the type of endpoint being analysed Type of endpoint Example Type of model which may be used Time-to-event All-cause mortality (e.g. HR=0.75) Cox proportional hazards model Binary (participant had event: yes/no)* Severe hypoglycaemia at 3 months (e.g. 24% vs 10%) Fisher s exact test, logistic regression model Severe hypoglycaemia at 3 months (e.g. number of severe hypoglycaemia events per participant per month) Change in HbA1c from baseline to endpoint (e.g. 1.00% or 10.93 mmol/mol vs 0.92% or 10.06 mmol/mol) Count data (multiple events per participant) Poisson regression model, negative binomial, ZIP, ZINB, joint frailty model Continuous (longitudinal) ANCOVA, MMRM *For a fixed time period (all participants are observed for the same time period) ANCOVA, analysis of covariance; MMRM, mixed-model repeated measures; ZIP, zero inflated Poisson model; ZINB, zero inflated negative binomial model 4

  5. So why do we need all of these different endpoints? Different endpoints evaluate different clinical benefits, such as: Glycaemic control Cardiovascular or kidney outcomes Safety All provide a piece of the puzzle to see the overall picture of risk/benefit and clinical significance 5

  6. What are time-to-event endpoints? The time from Start to Event is measured If the participant experiences the event, then the time to the first event is measured. If the participant does not experience the event, then they are censored and the time to when they were censored (e.g. last visit) is measured Some examples of TTE endpoints in diabetes: Time to all-cause mortality Start is randomisation Event is death from any cause Time to major adverse cardiovascular event Start is randomisation Events are death from cardiovascular causes, non-fatal myocardial infarction or non-fatal stroke Time is measured from randomisation to the first of any of these events Time to cardio-renal composite endpoint Start is randomisation Events are end-stage kidney disease* (dialysis, transplantation or a sustained eGFR of <15 ml/minute/1.73 m2), a doubling of the serum creatinine level, or death from kidney or cardiovascular causes Time is measured from randomisation to the first of any of these events *This is an example of how a cardio-renal composite endpoint may be defined as endpoints might be different across trials TTE, time to event 6

  7. So how can I easily compare time-to-event (survival) endpoints between two groups? To determine whether one treatment group has significantly better survival than the other treatment group, we usually look at: Thep-value from a log-rank test The HR, 95% CI and p-value from a Cox proportional hazards model* *The Cox model allows for adjustment of baseline covariates. In CVOTs, the primary endpoint analysis is usually a Cox model, which may include prespecified baseline covariates and stratification factors 7

  8. Fundamental statistics Choice of statistical method Choice of statistical method Commonly used statistics Commonly used statistics Statistical significance vs clinical significance Statistical significance vs clinical significance Analysis populations Analysis populations 8

  9. What is a log-rank test? Uses the survival probabilities from the entire KM survival curves Based on the same assumptions as the KM survival curves Provides ap-value A statistically significant p-value (e.g. p<0.05) means that there is a significant difference in survival between the two groups But it does NOT provide an estimate of the sizeof thedifference in survival between the two groups .for this we need a hazard ratio from the Cox proportional hazards model Notes: The log-rank test also generates a Z-statistic, which does inform us of the direction of the difference (i.e. which treatment is better) but this is rarely reported in publications. The KM method can overestimate the probability of the event(s) of interest occurring if there are a lot of competing events and is not recommended to be used. The CIF method is preferable to the KM method in the presence of competing risk events. CIF, cumulative incidence function; KM, Kaplan Meier Bland JM and Altman DG. BMJ 2004;328:1073 9

  10. How do I interpret a hazard ratio? HR=1 means equal efficacy of the treatments For endpoints in which the outcome is adverse (e.g. time to all-cause mortality), usually the HR (experimental treatment vs control) is expressed such that if the experimental treatment is: better than the control, then the HR is <1 worse than the control, then the HR is >1 Favours experimental treatment Favours control treatment If the endpoint is beneficial (e.g. time to glucose control), usually the experimental treatment would be better if the HR (experimental treatment vs control) is >1 and vice versa To claim superiority (experimental treatment is better than the control) or inferiority (experimental treatment is worse than the control), the 95% CI of the HR must not cross the line of unity (HR=1 in this example) 1 0.75 1.25 HR (of experimental treatment over control regimen) 10

  11. Example of interpreting a hazard ratio For example: if the HR=0.75 for all-cause mortality, this means, on average, approximately: A 25% lower risk* of death with the experimental treatment than with the control treatment (25% as 1 0.75=0.25) at ANY point in time during the trial *Assuming the proportional hazards assumption holds true and under the assumption that the survival times follow an exponential distribution; Strictly, the risk is the hazard Barraclough H et al. J Thorac Oncol 2011;6:978 11

  12. How do I calculate a hazard ratio? Use a Cox proportional hazards model in a statistical software package Concept is shown here: No. of people who died during time period (A) Proportion of the rate of people dying in experimental arm compared to control arm (i.e. hazard ratio) No. of people who dropped out during time period (i.e. censored) Rate of people dying (i.e. hazard rate, A/B) No. of people at risk* (B) Time period Treatment arm 100 3 0 0.03 Experimental 1st week 0.75 100 4 0 0.04 Control 97 6 3 0.06 Experimental 2nd week 0.74 96 8 1 0.08 Control 88 9 1 0.10 Experimental 3rd week 0.74 87 12 2 0.14 Control This conceptual example is based on the assumption that the survival times follow an exponential distribution. Therefore, in this example the HR represents an Event Rate Ratio *Participants who are alive and still in the study at the start of the time period Barraclough H et al. J Thorac Oncol 2011;6:978 12

  13. Assumptions of the hazard ratio The HR between treatment arms is assumed to be constant over the duration of a clinical trial For example: if HR=0.75, those in the experimental treatment arm have a 25% lower risk of death than those in the control arm at ANY point in time during the trial This is called the proportional hazards assumption 13 Barraclough H et al. J Thorac Oncol 2011;6:978

  14. How do I know if the proportional hazards assumption holds true? There are formal statistical tests and plots to assess whether the PH assumption holds true For example, Martingale residuals, Schoenfeld residuals versus survival time, and log-negative-log plots However, the results of these tests are rarely reported in the literature This is because it is generally indicated by the KM curves! If the KM curves: Do NOT cross Show a fairly consistent pattern of separation Then it is likely that the PH assumption holds true KM, Kaplan Meier; PH, proportional hazards Barraclough H et al. J Thorac Oncol 2011;6:978 14

  15. Appropriate interpretation of a hazard ratio Many clinical trials do produce data reasonably consistent with the assumption of a constant HR: 1. Curves do not cross 2. Fairly consistent pattern of separation 1.0 0.9 0.8 0.7 Probability of Survival 0.6 0.5 Standard therapy Intensive therapy 0.4 HR = 0.88 95% CI: 0.74 to 1.05 p = 0.14 0.3 0.2 0.1 0.0 0 2 4 6 8 Years No. at risk: Standard therapy 889 892 770 774 637 639 570 582 471 510 240 252 55 62 0 0 693 707 Intensive therapy 15

  16. But very non-constant hazard ratios do occur HR (95% CI) 0.74 (0.65, 0.85) p<0.001 PFS, progression-free survival Adapted from MokTS et al. N Engl J Med 2009;361:947 16

  17. If the hazard ratio is not constant. If the HR changes significantly over time, and in particular, the curves cross in a significant manner: The HR estimate from the trial data is insufficient to be considered in isolation as a complete summary of the treatment effect Which treatment is superior depends on what time-frame you examine; one treatment may be superior early on, and the other in the longer term 17

  18. Binary endpoint examples Examples of binary (yes/no) endpoints: The proportion of the population achieving HbA1c target values (e.g. HbA1c <7.0%) in each treatment group AE summarised by the number and percentage of people who experienced the AE in each treatment group The percentage of people who died due to any cause during the study (e.g. 5.3% vs 8.7%) Binary endpoints do not account for the timing of the event but only about the occurrence of the event at a fixed time point The validity depends on all participants being observed over the same time period 18 AE, adverse event

  19. How can we compare binary endpoints? Example methods are: Fisher s exact test provides a p-value Logistic regression model* provides an odds ratio (OR), 95% CI and p-value *The logistic regression model allows for adjustment of baseline covariates 19

  20. Fundamental statistics Choice of statistical method Choice of statistical method Commonly used statistics Commonly used statistics Statistical significance vs clinical significance Statistical significance vs clinical significance Analysis populations Analysis populations 20

  21. Statistical significance vs clinical significance p-values do NOT simply provide a yes or no answer Theyprovide a sense of the strength of the evidence p=0.02 means that there is a 2.0% probability that this result (or a better one) is observed by chance alone p=0.044.0% p=0.001 0.1% In order to get a yes or no answer, a significance level (often 0.05) must be defined If the p-value is below the defined significance level, the result is claimed to be statistically significant If the p-value is above the defined significance level, the result is claimed to be non-significant Statistical significance is good! But it s not the only consideration! 21

  22. How can I assess clinical significance? Look at the magnitude of the treatment effect is this clinically meaningful? Unlike statistical significance, there is no significance level . It s a clinical interpretation of the so what? 22

  23. Odds ratio Similar concept to an HR except it is used to analyse binary data (i.e. reached target/did not reach target, had AE/did not have AE) rather than time to event data (e.g. all-cause mortality time to death due to any cause) We use a logistic regression model to calculate an OR (recap: a Cox proportional hazards regression model is used to calculate an HR) 23 AE, adverse event

  24. How do I interpret an odds ratio? If OR=1, an event (e.g. AE) is equally likely to happen in each treatment arm Usually, the OR (experimental treatment vs control) for an adverse outcome is presented, given that if the experimental treatment is: better than the control, then the OR is <1 worse than the control, then the OR is >1 To claim superiority (experimental treatment is better than the control) or inferiority (experimental treatment is worse than the control), the 95% CI of the OR must not cross the line of unity (OR=1 in this example) Favours experimental treatment Favours control treatment 1 0.75 1.25 OR (of experimental treatment over control regimen) 24 AE, adverse event; OR, odds ratio

  25. Example of interpreting an odds ratio An OR (experimental vs control) of 0.75 for experiencing an AE: Those in the experimental arm have a 25% lower odds (1 0.75=0.25) of having the AE than those in the control arm Important note: odds is not the same as risk. One may intuitively tend to think in terms of risk (e.g. the number of participants who had an AE divided by the total population), not odds. A common mistake is to mix up odds and risk. 25 AE, adverse event

  26. How are the odds calculated? Odds=p/(1-p) where p=probability of the event occurring The proportions can be compared using Fisher s exact test too! Experienced an AE on Tx E=20% Experienced an AE on Tx C=25% Example: Odds of experiencing an AE on Tx E=0.20/(1-0.20)=0.20/0.80=0.25 Odds of experiencing an AE on Tx C=0.25/(1-0.25)=0.25/0.75=0.33 Odds ratio (E vs C)=0.25/0.33=0.75 26 AE, adverse event; Tx E, experimental treatment arm; Tx C, control treatment arm

  27. Risk difference vs relative risk vs odds ratio Time Experimental treatment Had AE Did not have AE 7 93 Control treatment 12 88 Proportion experiencing an AE on Tx E=7/(7+93)=7/100=0.07 (7%) Proportion experiencing an AE on Tx C=12/(12+88)=12/100=0.12 (12%) Risk difference (or absolute risk reduction)=0.12-0.07=0.05 (5%) RR (also called risk ratio)=0.07/0.12=0.58 Relative risk reduction=1-RR=1-0.58=0.42 (42%) The RR and OR are similar Odds of experiencing an AE on Tx E=7/93=0.08 Odds of experiencing an AE on Tx C=12/88=0.14 OR=0.08/0.14=0.55* Relative odds reduction=1-OR=1-0.55=0.45 *Calculated on raw numbers, not rounded numbers AE, adverse event; RR, relative risk; Tx E, experimental treatment arm; Tx C, control treatment arm 27

  28. Risk difference vs relative risk vs odds ratio Time Experimental treatment Had AE Did not have AE 55 45 Control treatment 60 40 Proportion experiencing an AE on Tx E=55/(55+45)=55/100=0.55 (55%) Proportion experiencing an AE on Tx C=60/(60+40)=60/100=0.60 (60%) Risk difference (or absolute risk reduction)=60%-55%=5% RR (also called risk ratio)=55%/60%=0.92 Relative risk reduction=1-RR=1-0.92=0.08 (8%) The RR and OR are not similar Odds of experiencing an AE on Tx E=55/45=1.22 Odds of experiencing an AE on Tx C=60/40=1.50 OR=1.22/1.50=0.81 Relative odds reduction=1-OR=1-0.81=0.19 28 AE, adverse event; RR, relative risk; Tx E, experimental treatment arm; Tx C, control treatment arm

  29. Number needed to treat The NNT is the number of people who needed to be treated with the experimental treatment for one less participant to experience the outcome of interest than if those same people had been treated with the control treatment For binary endpoints, it is calculated as the reciprocal of the ARR between the treatment groups: Proportion experiencing an AE on Tx E=55/100=0.55 (55%) Proportion experiencing an AE on Tx C=60/100=0.60 (60%) Risk difference (or ARR)=0.60-0.55=0.05 (5%) NNT for a binary endpoint=1/ARR=1/0.05=20 AE, adverse event; ARR, absolute risk reduction; NNT, number needed to treat; Tx E, experimental treatment arm; Tx C, control treatment arm Sedgwick P. BMJ 2013;347:f4605 29

  30. Some handy hints The absolute risk describes the risk of an outcome in a single group, whereas a relative risk estimates the risk of the specified outcome in the experimental group versus the control group The HR is also a relative measure but should not be confused with an RR. An HR is used to analyse TTE data whereas an RR is used to analyse binary data The OR and RR give similar results when the specified outcome is not common (say <20 30% of participants) but diverge for more common specified outcomes Hence, always make sure you know what you are looking at! Keeping it simple and reporting proportions with a p-value from Fisher s exact test, and discussing the clinical significance, is a preferable option whenever possible 30 AE, adverse event; OR, odds ratio; RR, relative risk; TTE, time to event

  31. Fundamental statistics Choice of statistical method Choice of statistical method Commonly used statistics Commonly used statistics Statistical significance vs clinical significance Statistical significance vs clinical significance Analysis populations Analysis populations 31

  32. Analysis populations Analysis population ITT Description Full randomised population analysed according to the treatment arm to which they were randomised Definitions vary from study to study. Analyses follow the ITT principle but may exclude participants who did not take at least one dose of the study treatment or had missing post-baseline measurements Modified ITT ICH E9 defines FAS as the analysis set which is as complete as possible and as close as possible to the ITT ideal of including all randomised subjects . Analyses the population according to the treatment they were randomised to. May exclude: People who did not meet major inclusion/exclusion criteria People who did not take at least one dose of study treatment People with no post-randomisation measurements FAS A subset of the population from the ITT/FAS who are more compliant with the protocol. Definitions vary from study to study. Typically participants are analysed according to: The study treatment they received Major protocol violators are excluded (e.g. non-adherence to treatment, switched groups or missed measurements of primary endpoint) PP Participants who received at least one dose of study drug are generally analysed according to the drug they actually received. Analysis may be restricted to data obtained while participants were on-treatment Safety FAS, full analysis set; ICH, The International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use; ITT, intent-to-treat; PP, per protocol European Medicines Agency. ICH: E 9: statistical principles for clinical trials Step 5.1998. https://www.ema.europa.eu/en/ich-e9-statistical-principles-clinical-trials (accessed Jan 2023); Abraha I and Montedori A. BMJ 2010;340:c2697 32

  33. Which analysis population should I use? ITT tends to make the two treatments look similar, whereas PP is more able to reflect treatment differences1 To analyse the primary (efficacy) endpoint:2,3 Superiority study: ITT/FAS as the primary analysis, PP as supportive NI study: Conduct BOTH the ITT/FAS and PP analysis populations to analyse the primary endpoint (from which NI will be claimed or not) Whichever one is specified as the main analysis, specify the other as a supporting analysis In a well-conducted study, the ITT and PP results should be very similar and not change the interpretation. Any major differences warrant further examination Best practice to report both ITT and PP results in publications To analyse safety endpoints: Use the safety analysis set FAS, full analysis set; ITT, intent-to-treat; NI, non inferiority; PP, per protocol 1. D Agostino RB et al. Statist Med 2003;22:169; 2. European Medicines Agency. Switching between superiority and non-inferiority. 2000. https://www.ema.europa.eu/en/switching-between-superiority-non-inferiority; 3. US Food and Drug Administration. Non-inferiority clinical trials. 2016. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/non-inferiority-clinical-trials (all websites accessed Jan 2023) 33

  34. Estimands In 2017, the International Council for Harmonisation (ICH) Steering Committee published a draft guideline: Estimands and sensitivity analysis of clinical trials (ICH E9 [R1]) 1,2 During 2019, the results of several T2D clinical trials were published that adopted the use of estimands 2 An estimand is a precise description of the treatment effect reflecting the clinical question posed by a given clinical trial objective. It summarises at a population level what the outcomes would be in the same people under different treatment conditions being compared. 1 1. European Medicines Agency. ICH E9 (R1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials. 2020. https://www.ema.europa.eu/en/ich-e9-statistical-principles-clinical-trials (accessed Jan 2023); 2. Min T and Bain SC. Lancet Diab Endocrinol 2020;8:181 34

  35. Which attributes define estimands? 1. Population of interest for the scientific question Example: adults with T2D 2. Endpoint of interest Example: change from baseline in HbA1c at Week 26 3. Way to handle intercurrent events* Example: regardless of whether or not the person received rescue medication and/or discontinued study treatment early 4. Population level summary Example: mean difference between treatment groups *An intercurrent event is an event occurring after treatment initiation that affects either the interpretation or the existence of the measurements associated with the clinical question of interest. It is necessary to address intercurrent events when describing the clinical question of interest in order to precisely define the treatment effect that is to be estimated 1. European Medicines Agency. ICH E9 (R1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials. 2020. https://www.ema.europa.eu/en/ich-e9-statistical-principles-clinical-trials (accessed Jan 2023); 2. Min T and Bain SC. Lancet Diab Endocrinol 2020;8:181 35

  36. Different strategies result in different estimands Strategy Description for handling of intercurrent event The occurrence of the intercurrent event is considered irrelevant in defining the treatment effect of interest: the value for the variable of interest is used regardless of whether or not the intercurrent event occurs1 A scenario is imagined in which the intercurrent event would not occur: the value of the variable to reflect the clinical question of interest is the value which the variable would have taken in the hypothetical scenario defined1 This relates to the variable of interest.1 The intercurrent event is integrated with one or more measures of clinical outcome as a combined variable of interest2 This relates to the population of interest.1 Only the subset of the trial population who did (or who did not) have the intercurrent event may be subjected to the analysis2 Describes the treatment effect before any intercurrent events occurred. The outcomes until the time of the intercurrent event are analysed2 Treatment policy Hypothetical Composite variable Principal stratum While-on-treatment 1. European Medicines Agency. ICH E9 (R1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials. 2020. https://www.ema.europa.eu/en/ich-e9-statistical-principles-clinical-trials (accessed Jan 2023); 2. Min T and Bain SC. Lancet Diab Endocrinol 2020;8:181 36

  37. Handy hints Which strategy or estimand should be chosen? Not just a decision for statisticians Depends on what is of interest (may need multiple estimands) Should be prespecified in protocol Regulators may have different preferences When reviewing results, it is important to understand what the estimand is ITT, intent-to-treat; PP, per protocol Min T and Bain SC. Lancet Diab Endocrinol 2020;8:181 37

Related


More Related Content