class: center, middle, inverse, title-slide .title[ # Randomized controlled trials ] .subtitle[ ## Analysis, interpretation, intended and unintended effects ] .author[ ### Elena Dudukina, PhD, Department of Clinical Epidemiology, AU & AUH ] .date[ ### 2023-02-08 ] --- # What is RCT? - RCT is an interventional study - The exposure status of the participants is randomly assigned and concealed - We track individuals over time to determine their outcomes (longitudinal study) ![:scale 80%](Algorithm-for-classification-of-types-of-clinical-research.png) .footnote[Grimes, K.F. Schulz: An overview of clinical research: the lay of the land, The Lancet, Vol. 359 (9300), pp. 57-61] --- # Why do we need RCTs? - Experimental design with random allocation of treatment is the only design (when conducted properly) that guarantees groups exchangeability (no measured and unmeasured confounding) and, therefore, grants causal conclusions - The observed risk of the outcome in treated and untreated in the RCT is closest we can get to the ideal counterfactual risks of the outcome in the treated and untreated ![:scale 70%](exchangeability.png) .footnote[Hernรกn MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC] --- # What is exchangeability? - When the groups of exposed and unexposed are exchangeable, the risk of the outcome in the treated group would have been the same as the risk of the outcome in the untreated group had the treatment been switched between the groups (M. Hernan) - โExchangeability means people who are treated and untreated have the same probability of the outcome happening due to all mechanisms other than treatmentโ (E. Murray) - RCTs is the only study design for which exchangeability (no confoudning assumption) is expected to hold (unlike observational research) - Exchangeable: ๐๐๐๐๐๐๐๐๐๐ vs ๐๐๐๐๐๐๐๐๐๐ - Not exchangeable: ๐๐๐๐๐๐๐๐๐๐ vs ๐ฟ๏ธ๐ฟ๏ธ๐ฟ๏ธ๐ฟ๏ธ๐ฟ๏ธ๐ฟ๏ธ๐ฟ๏ธ๐ฟ๏ธ๐ฟ๏ธ๐ฟ๏ธ --- # Are these fruits exchangeable? ![:scale 45%](other-red-apples.jpg) vs ![:scale 30%](red-apples.jpg) -- - Yes (both groups are red apples) - Fair comparison --- # Follow-up in RCTs ![:scale 90%](fup.png) --- # Mock RCT: Apples vs apples - RCT in the population of apples ![:scale 25%](apples.jpg) - Recruit apples with a set of inclusion and exclusion criteria - Fresh (grown in the closest summer season) - Red - Sweet - Medium size (120 g) - N=200 (100 apples per treatment arm) - Grown locally --- # Design of the RCT - Exposure: different regimes of washing - Outcome: "time-to-spoiled" (max. 30 days) - Randomly assign apples into treatment arms (washing with water + soap) vs standard treatment (washing with only water) - Randomization with the algorithm with ratio 1:1 - People who wash apples cannot be blinded, but people who are going to store and follow-up apples for the outcome are blinded to the data on how apples were washed - Follow apples up for the outcome - Lost-to-follow-up possible? -- - People who follow-up apples for the outcomes are students - They do this during their lunch time - Some apples will be eaten before they have a chance to get spoiled (lost-to-follow-up) --- # Apples vs non-apples - In non-interventional setting epidemiologic studies are especially susceptible to bias, including confounding - Non-exchangeability โ study compares apples with non-apples and makes erroneous conclusions attributing the found association to the effect of exposure (washing with water alone vs with water + soap) on the outcome (time to getting spoiled), while it is the confounding that can explain the observed association ![:scale 23.5%](other-red-apples.jpg) vs ![](bananas.jfif) -- - Not a fair comparison --- # RCTs features .center[![:scale 10%](dice.png) ![:scale 10%](concealment.png) ![:scale 10%](blinding.png) ] --- # RCTs features - Random allocation to intervention groups - Both groups (active vs control) will be treated identically in all respects except for the intervention - Allocation concealment: personnel performing randomization are prevented from knowing the patients' treatment allocation - Patients and investigators remain unaware of what treatment was given until the study is completed - Double blind studies are not always feasible/appropriate - Triple blinded studies: investigators, participating subjects and data analysts have no knowledge of the treatment allocation - Patients are often analysed according to the group to which they were allocated, irrespective of whether they actually received the intended intervention (intention to treat analysis, ITT) --- # Ethical and practical aspects of RCTs - Is it ethical to randomize? - Is it ethical to withhold treatment and randomize to placebo? - Is it possible to randomize lifestyle/behavior/weight or body mass index/gestational age for infants/having certain conditions? -- - Costly - Time-consuming - Highly selected population โ will findings from the RCT generalize for the heterogeneous real-world (patient) population? --- # Why allocation concealment & blinding are important? <blockquote class="tiktok-embed" cite="https://www.tiktok.com/@lolabird98/video/7054678349204999430" data-video-id="7054678349204999430" style="max-width: 605px;min-width: 325px;" > <section> <a target="_blank" title="@lolabird98" href="https://www.tiktok.com/@lolabird98">@lolabird98</a> <a title="fyp" target="_blank" href="https://www.tiktok.com/tag/fyp">#fyp</a> <a title="trial" target="_blank" href="https://www.tiktok.com/tag/trial">#trial</a> <a title="work" target="_blank" href="https://www.tiktok.com/tag/work">#work</a> <a title="office" target="_blank" href="https://www.tiktok.com/tag/office">#office</a> <a title="capitalism" target="_blank" href="https://www.tiktok.com/tag/capitalism">#capitalism</a> <a target="_blank" title="โฌ original sound - lolabird98" href="https://www.tiktok.com/music/original-sound-7054678306515897094">โฌ original sound - lolabird98</a> </section> </blockquote> <script async src="https://www.tiktok.com/embed.js"></script> --- # Why allocation concealment & blinding are important? - Directed acyclic graphs (DAGs) for RCTs ![:scale 100%](rct-blinded.png) ![:scale 100%](rct-unblinded.png) --- - The path Z โ Y is the effect of the randomization (randomized treatment) on the outcome Y - When evaluating effect of Z, we evaluate the intention-to-treat effect - Patients are analysed within the group to which they were allocated, irrespective of whether they actually received the intended intervention - The path A โ Y is the effect of treatment actually received on the outcome - Per-protocol effect - Observational data, since post/randomization - Confounding by L: L is associated with A and with Y and is not on the pathway from A to Y ![:scale 40%](rct-dag.png) --- # Analyzing RCTs - Did someone find an error in the video lecture? Total population, N| Random treatment assignment (Z), n | 28-day mortality (Y=1), n| ------------------ | -----------------------------------|--------------------------| 1000 | Z=1 (treated) n=482 | 262 | | Z=0 (comparator/placebo) n=518 | 290 | - Risk difference (RD): `\(risk(treated) - risk(comparator)\)` = `\(\frac{262}{482} - \frac{290}{518}\)` = `\(0.54-0.56\)` = `\(-0.016\)` - Risk ratio (RR): `\(risk(treated) / risk(comparator)\)` = `\(\frac{262}{482} / \frac{290}{518}\)` = `\(0.97\)` - Small magnitude of effect --- # Interpretation of the effects on absolute and relative scale - Percent Relative Effect: (1-0.97)=0.03=3% decrease in relative risk - Risk difference - Among those who has been assigned to receive treatment (Z=1), ~2 lives per 100 were saved as compared to those who has been assigned no treatment (Z=0) - Relative risk - Those who has been assigned to receive treatment (Z=1) had *0.97 times the risk* of dying compared to those who has been assigned no treatment (Z=0) - Individuals who have been assigned to receive treatment (Z=1) had 3% lower relative risk of dying compared to those who has been assigned no treatment (Z=0) --- # Features of intention-to-treat (ITT) effect - Effect of randomization (Z) - Causal interpretation since no confounding expected - In the ideal world everyone has perfect adherence to treatment โ ITT effect is the same as per-protocol effect ![:scale 50%](itt-vs-pp.png) --- # Per-protocol effect features ![:scale 90%](itt-table.png) --- # Per-protocol effect features - The effect of treatment (A) on the outcome (Y) - Distribution of the outcome in actually treated (A=1) vs actually untreated (A = 0) โ โas-treatedโ - The treatment allocation is no longer only dependent on the randomization but also can be dependent on patients' characteristics (L) - Patients outcome is not independent of patient's characteristics (L) - Patients characteristics (L) is the confounder - Associated with the exposure - Associated with the outcome - Not on the causal pathway between the exposure and the outcome - When computing per-protocol effect, need to adjust for L --- # Loss-to-follow-up - Source of selection bias in RCTs -- - Washing apples with water and soap keeps apples fresh longer vs washing with water alone - Students who evaluate apple outcomes every now and then eat the most fresh-looking apples - Students are hungry and fresher apples are more likely to be lost-to-follow-up - By doing so, they eat apples predominantly from "water + soap" group (without knowing it) - How this would affect the final results? --- # Loss-to-follow-up - At the end of follow-up - Observed without accounting for loss-to-follow-up ![:scale 70%](1.png) - RR (fresh=1) = 0.5/0.6 = 0.8 - Apples washed with water and soap have 20% lower relative risk (10% lower absolute risk) of being fresh vs apples washed with water alone --- # Loss-to-follow-up - At the end of follow-up - With accounting for loss-to-follow-up ![:scale 60%](2.png) - RR (fresh=1) = 0.8/0.7 = 1.14 - Apples washed with water and soap have 14% higher risk of being fresh than apples washed with water alone --- # Loss-to-follow-up - Differential loss to follow-up (informative censoring) - Follow-up studies including RCTs - When present, effect in those who was not lost-to-follow-up โ conditioning on participation โ selection bias ![](loss-to-fup.png) --- # Recap - Randomization secures exchangeability between the treatment arms - Intention-to-treat (ITT) effect is an effect of the randomization on the outcome - Per protocol analysis aims to investigate the effect of the actual treatment on the outcome - If adherence to the treatment is not perfect, the Intention-to-treat effect โ Per-protocol effect - Loss-to-follow-up introduces selection bias to RCT results and needs to be addressed in the analysis --- # Break ![:scale 50%](break.jpg) --- # Interpreting randomized controlled trials - Objective: valid and precise estimate of the effect of an exposure on the occurrence of an outcome ![](obj.png) - Exchangeability between treatment and control groups (no confounding) ![](rct-dag-2.png) --- # Descriptive table (โTable 1โ) ![](tab1.png) --- # Accuracy of the effect estimate - The overall goal: precise + valid - Simulation example ![:scale 80%](accuracy.png) --- # Bias in epidemiologic research - Random error - Sampling (random) variability - Chance - Systematic error - Selection bias - Measurement (information) bias - Confounding (not expected in RCT) - Random non-exchangeability --- # Precision - No random error - Confidence interval width ![:scale 80%](ci.png) --- # Confidence interval (CI) - Estimate of the uncertainty due to random error alone - Interpretation of 95% CI - How often 95 % confidence intervals computed from many studies would contain the true effect size estimate - Given all the assumptions used to compute the intervals were correct ![:scale 30%](ci-meaning.jpg) --- # 95% CI - 95% of the time 95% CI contains true effect parameter <a href="https://rpsychologist.com/d3/ci/">Watch 95% CI simulation</a> --- # Validity - No systematic error - Simulation example ![:scale 80%](validity.png) --- # Validity in studies - Internal validity - Validity of the inferences about the source population - External validity (= generalizability) - Validity of the inferences about people outside source population - Internal validity โ external validity --- # Null hypothesis significance testing - Not applicable for interpretation of the results of observational studies - H0 - null hypothesis (hypothesis of no association) - Risk difference is zero or the risk ratio is 1 - Is a hypothesis of no association between two variables in a _superpopulation_ - The groups we compare were sampled in a _random fashion_ from a _superpopulation_ - Is not about the observed study groups --- # Null hypothesis significance testing - The difference is not statistically significant means - One cannot reject the null hypothesis that the superpopulation groups are different - Does not mean that the two observed groups are the same .footnote[Rothman, K., Greenland, S., & Lash, TL. (2008).ย Modern Epidemiology, 3rd Edition. Philadelphia, PA: Lippincott Williams & Wilkins. https://www.nature.com/articles/d41586-019-00857-9 ] --- # Type I and II errors ![:scale 60%](type-1-and-2-errors.png) --- # Type I and II errors ![](type-1-and-2-errors-2.png) --- # Type I and II errors - Type I error: you're left with a false-positive - Type II error: you're left with false-negative ![](pregnant-meme.jpg) --- # What p-value is not? ![](p-value-meme-scheme.png) --- # What is p-value? ![](p-value-statistics-meme-afraid-to-ask.jpg) --- # What is p-value? - Is the probability that a test statistic (computed using the data) would be greater than or equal to its observed value, assuming that the test hypothesis is correct and all assumptions hold - Results are claimed to be โsignificantโ or โnot significantโ according to whether the p-value is less than or greater than an arbitrary cutoff value, usually 0.05, which is called the alpha level of the test --- # P-value - The observed difference can be statistically significant - When the model used to compute it is wrong - Bias - Due to chance - 0.05 alpha level โ statistically significant difference 5% of the time if the null hypothesis is correct - the alpha cutoff point is arbitrary - Dichotomization of study results based on p-values is harmful - Cherry-picking of โsignificantโ results --- # How to think about p-value? - A small p-value โ the data are unusual if all the assumptions used to compute the test statistics (including the null hypothesis) were correct - When a study is large, very minor effects or small assumption violations can lead to "statistically significant" results of the null hypothesis tests .footnote[Rothman, K., Greenland, S., & Lash, TL. (2008).ย Modern Epidemiology, 3rd Edition. Philadelphia, PA: Lippincott Williams & Wilkins. https://www.nature.com/articles/d41586-019-00857-9 ] --- # Example ![:scale 45%](m_joi190001va.png) ![:scale 45%](jama-exmaple.png) - All-cause 28-day mortality was lower in patients receiving treatment #1 (35%) than those receiving treatment # 2 (43%) - Risk difference: โ8.5% (95% CI, โ18.2% to 1.2%) - Hazard ratio (approximates incidence rate ratio): 0.75 -- - 95% CI of hazard ratio: 0.55-1.02 - p-value: 0.06 -- - Why did authors conclude no beneficial effect of treatment #1 vs treatment #2? - What do you conclude? --- # Recap - Table1: balance is desirable but may not be perfect - Accuracy = Validity + Precision - Precision = no of random error - Validity = no systematic error - Selection bias + information bias - Internal validity โ external validity - Interpret the RCT results concentrating on the validity and precision, but not p-values --- # Break ![:scale 50%](break.jpg) --- # Intended and unintended effects in RCTs ## Intended effects - Efficacy - Beneficial effects of the interventions/treatments - Primary outcomes (end points) - Should be explicitly indicated in the report of an RCT --- # Intended and unintended effects in RCTs ## Unintended effects - Harms - Adverse events - Undesired effects of interventions/treatments --- # Unintended effects - โHarms should always be viewed as important whether they are labelled primary or secondaryโ (CONSORT statement) - Existence and nature of adverse effects - Withdrawal of participants due to an adverse event โ loss-to-follow-up (selection bias) - Often unexpected and unpredictable - Harms-related stopping of an RCT --- # Reporting - CONSORT (Consolidated Standards of Reporting Trials) - Separately reporting anticipated and unexpected adverse events - For each study arm - The absolute risk of each adverse event, including recurrent - Number of participants withdrawn due to harms - Interpretation - Results, balancing benefits and harms - Considering other evidence - Systematic review and meta-analysis - Benefit-risk balance - Not all reported adverse events are necessarily caused by the intervention --- # Example ![](adverse-example.png) --- # After RCT - Completely unpredictable adverse effects โ hard to study in RCT - Post-authorization safety study (PASS) (non-interventional) - Carried out after a medicine has been authorized to obtain data on safety --- # Recap - Intended effect in RCTs are beneficial effects of the interventions/treatments - Intended effect โ efficacy - Unintended effects in RCTs are undesired effects, or harms - Expected/predictable - Unexpected/unpredictable - Adverse effects can lead to loss-to-follow-up (selection bias) in RCT - Importance of reporting - Unpredictable adverse effects โ Post-authorization safety study (PASS) (non-interventional) --- ![:scale 30%](end.jpg)