USMLE Biostats Review: Insights on Sensitivity and Specificity
In this biostatistics review episode, key concepts of sensitivity and specificity in diagnostic tests are discussed using clinical scenarios. Through practical examples and explanations, viewers gain a solid understanding of interpreting sensitivity and specificity of tests in diagnosing medical conditions.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Divine Intervention Episode 143 (USMLE Biostats Review) Some Resident
Q1 A new serum test is created to screen for peripheral arterial disease. The sensitivity of the test is 80%. The most accurate interpretation of this statement is? a. Patients with positive test results have an 80% chance of having the disease. b. In patients with negative test results, 80% do not have the disease. c. In patients who have the disease, 20% will have a negative test result. d. Patients with negative test results have an 80% chance of not having the disease.
Q1 Key -The best answer here is C. -In my experience, answering NBME questions rarely depends solely on doing math. Understanding is the way to go! -Sensitivity essentially answers the Q-Of all the population with a given disease, what % have +ve test results? That s it! -The other % that you don t detect that TRULY have disease are the false negatives. The 2nd word is negative but the word in front of it is false so you know that they are in fact +ve. I use this 2nd-1st word mantra to keep things straight. Highly seNsitive tests have a low fNr.
Q2 A study is done on 1000 patients with a history of glioblastoma (GBM). A new serum test (ST) is done to screen for recurrent GBM. 100 patients have a positive ST test and 900 have a negative ST test. Brain imaging with biopsy is done on all these patients and 30 recurrences of GBM are found. 10 patients with positive ST tests have GBM and 20 patients with negative ST tests have GBM. Which of the following best represents the sensitivity of ST tests? a. 92% b. 35% c. 75% d. 50%
Q2 Key -The best answer here is B. The sensitivity is 33%. This is the closest # to 35%. The NBME occasionally plays this trick where inexact answers are posted. When this occurs, pick the answer that is closest to your math. -The Q here sounds nebulous but simple math based on understanding will save the day. -Sensitivity essentially answers the Q-Of all the population with a given disease, what % have +ve test results? -The total diseased population is 30 people. The # with +ve test results were 10. So sensitivity = 10/30 = 33%. You re welcome to test a 2 by 2 table.
Q3 A new serum test for glioblastoma (GBM) has a specificity of 90%. The most accurate interpretation of this statement is? a. 90% of patients with GBM have positive test results. b. 10% of patients with GBM are missed by this test. c. 10% of patients without GBM have positive test results. d. 90% of patients without GBM have positive test results.
Q3 Key -The best answer here is C. -Again, simple math + understanding = clutch on this Q. -Specificity essentially answers the Q-Of all the population without a given disease, what % have -ve test results? That s it! -The specificity of this test is 90%. So of the people w/o GBM, 90% test - ve. So 10% that should have tested -ve, ultimately end up testing +ve (aka false +ve s). -A highly sPecific test has a low fPr.
Sidebar 1-SPin and SNout principle -If a test is highly sensitive, people with disease should have a +ve test result. -If the test is -ve, then disease should be absent (aka a low FNR). A -ve test should rule OUT disease. -If a test is highly specific, people w/o disease should have a -ve test result. -If the test is +ve, then disease should be present (aka a low FPR). A +ve test should rule IN disease.
Sidebar 2-Screening and Confirmatory Tests -In tests with high sensitivity, people with disease should have +ve test results. -High sensitivity tests make good screening tests so you don t inadvertently miss out on people with disease. For example, you d hate to miss out on people with HIV. This is why you use the ELISA test. -In tests with high specificity, people w/o disease should have -ve test results. -High specificity tests make good confirmatory tests so you don t inadvertently label people w/o disease as having a disease. Tests that are highly specific are very good at labeling people w/o disease so if the test is +ve (and by definition, high specificity tests have a low FPR), you very likely have disease. This is why Western Blots are undertaken after a +ve ELISA so you don t tell a patient they have HIV based on a +ve ELISA when they don t! -Note however, that the WB is no longer done in most places as a confirmatory test.
Q4 Which of the following points best represents the region of the graph with the highest positive predictive value (PPV) for the detection of Type 2 Diabetes Mellitus (T2DM)?
Q4 Key -The best answer here is C. -These Q s have a high tendency to be annoying. To beat them, remember the following; The highest PPV region on a graph, corresponds to the region with the highest sPecificity, which corresponds to the region that DOES NOT miss anyone w/o disease. If you remember this, you re golden. -Said another way, the highest PPV is achieved if the test when +ve, only includes people that have the disease. -PPV simply means the % of people with +ve tests who have disease.
Sidebar-Do not mix this up! Sensitivity of a test represents the % of people with disease who have +ve test results. PPV of a test represents the % of people with +ve test results who have disease. DO NOT MIX THIS UP! If you switch the words before and after who have , you should be able to keep things straight. Learn one side and remember that the other one is the other one.
Q5 Which of the following points best represents the region of the graph with the highest negative predictive value (NPV) for the detection of Type 2 Diabetes Mellitus (T2DM)?
Q5 Key -The best answer here is B. -These Q s have a high tendency to be annoying. To beat them, remember the following; The highest NPV region on a graph, corresponds to the region with the highest seNsitivity, which corresponds to the region that DOES NOT miss anyone with disease. If you remember this, you re golden. -Said another way, the highest NPV is achieved if the test when -ve, only includes people that don t have the disease. -NPV simply means the % of people with -ve tests who don t have disease
Sidebar-Do not mix this up! Specificity of a test represents the % of people w/o disease who have -ve test results. NPV of a test represents the % of people with -ve test results who don t have disease. DO NOT MIX THIS UP! If you switch the words before and after who have , you should be able to keep things straight. Learn one side and remember that the other one is the other one.
Q6 A clinical trial is conducted to measure the effectiveness of the IM test as a screening tool for the detection of testicular cancer. 500 IM tests are obtained. 20 men have positive IM tests and are found by testicular biopsy to have testicular cancer. 180 men have positive IM tests and are negative for testicular cancer by biopsy. 290 men have negative IM tests and are negative for testicular cancer by biopsy. 10 men have negative IM tests and are found to be testicular cancer positive by biopsy. What is the NPV of this test for the detection of testicular cancer? a. 97% b. 10% c. 33% d. 40% e. 90%
Q6 Key -The best answer here is A. No need to panic on these questions with tons of numbers. Simply define the qty that is being tested AND then abstract the #s you need. Many times the #s given are not useful. -NPV of a test represents the % of people with -ve test results who don t have disease. -There are 300 people with -ve IM test results. Of these people, 290 DO NOT have testicular cancer. So the NPV is basically 290/300 which is 97%.
Q7 If the cutoff for a positive IM test result for the detection of testicular cancer (TC) is 5, which of the following best represents the outcome of adjusting the test cutoff value to 1? a. PPV would increase but NPV would decrease. b. Specificity would decrease but sensitivity would increase. c. PPV and NPV would both increase. d. Sensitivity and specificity would both increase.
Q7 Key -The best answer here is B. -The name of the game with biostats Q s is to first define what is being tested (doing your analysis first) before picking out an answer. When you look at the answers first, your mind is swayed in -ve directions. -The prior cutoff is 5 (above 5, you have TC). If you bring it down to 1, you vastly increase your chances of catching every single person with TC. In other words, you don t miss anyone. -This increases the sensitivity of a test. Whenever seNsitivity goes up, Npv goes up. sPecificity and Ppv also go in the same direction.
Q8 A medical student at Johns Hopkins invents a drug that improves survival in patients with Glioblastoma Multiforme (GBM) by 7 years. Which of the following changes would be seen a few years after drug FDA approval? a. The sensitivity of screening tests for detecting GBM would decrease. b. The prevalence of GBM would increase in the population. c. The PPV of GBM detection tests would decrease. d. The incidence of GBM would increase in the population. e. The specificity of screening tests for detecting GBM would increase. f. The NPV of GBM detection tests would increase
Q8 Key -The best answer here is B. -By having this awesome Hopkins invented drug, we would keep more people who have already been diagnosed with GBM alive, which is great, so the # of people with GBM in the population would increase. -Therefore, prevalence increases. As Prevalence goes up, Ppv should increase, hence C is wrong. NPV would decrease, so F is wrong (look at next slide). -Changes in prevalence do nothing to test sensitivity and specificity so A and E are wrong. The only things that change these qties are changes in the actual test (like modifying the cutoff values). -We will likely still be diagnosing GBM at the same rate, so incidence stays the same.
Sidebar 1-Why does PPV increase with prevalence? Think of this, if a person comes to the ED in December with fevers, rhinorrhea, and myalgias, they likely have the flu. If you got a -ve flu swab result, would you believe this? The prevalence of the flu goes up in December so NPV goes down, but PPV goes up. You are less likely to believe the results of a -ve test during this high prevalence period. Stated another way, you are a lot more likely to believe the results of a +ve test if the disease is common!
Sidebar 2-Incidence vs Prevalence -Incidence represents the # of new cases of a disease that have been diagnosed within a specific time period. -Prevalence is the # of people that are alive AT a given time period.
Q9 An M2 (2nd year med student) researcher at The Gifted Medical Students Institute plans to study the effects of consuming high amounts of kale on the development of pheochromocytoma. He plans to publish the results of his study prior to graduation. Which of the following study designs presents the most appropriate means of completing the study? a. Randomized control trial. b. Prospective cohort study. c. Crossover study. d. Case-control study. e. Case report.
Q9 Key -The best answer here is D. -The phenomenon the researcher is trying to measure here is exceedingly rare and he has a limited time frame. -Approaching this by way of a prospective cohort study/RCT would literally take as much time as a 60+ year medical career. -To study rare phenomena, case-control studies are typically the best option on NBME exams. -Results generated from the CCS can then be used to formulate research Q s that can be examined in a cohort study/RCT.
Sidebar-Case-Control Studies -In a CCS, you need 2 groups of people with similar characteristics. -Group 1 have the disease in Q (pheo), Group 2 do not have the disease in Q (no pheo). -You then ask about exposures they may have had back in the day. You should already imagine that recall bias may be a prominent issue with CCS. -It is HY to know that CCSs give rise to data pertaining to odds ratios.
Q10 A professor and 2 medical students undertake a case control study over the course of a year and publish their results in a high impact journal. Which of the following best represents an example of a possible conclusion from their study? a. Duloxetine decreases pain scores in patients with fibromyalgia. b. A combination of Sofosbuvir and Ledipasvir cures hepatitis C with high fidelity. c. Asbestos exposure causes mesothelioma. d. Ursodiol administration improves survival in patients with primary biliary cholangitis.
Q10 Key -The best answer here is C. -In option C, the researchers essentially looked at people with mesothelioma and compared them to people w/o mesothelioma. They likely determined that a good # of people with mesothelioma had prior exposure to asbestos. -Option A, B, and D are wrong because they involve interventions which are things you d ordinarily do in a RCT. -As is evident with this Q, you can t just memorize facts and do well on these USMLE exams. You actually need to understand concepts. This is the central principle behind doing well regardless of Q difficulty on these exams. -CCS/Cohort studies deal with exposures, RCTs deal with interventions. DETOUR
Q11 The average normal CD4 count is 1000 per mm3 of blood with a standard deviation of 100/mm3. Which of the following best represents the normal percentage of individuals who would be measured to have a CD4 count > 1200/mm3 of blood? a. 2.51% b. 95% c. 5% d. 16% e. 68.2%
Q11 Key -The best answer here is A. -The key principle to realize here is that 95% of the population will fall within 2 SDs (2*100 = 200) of the mean-from 800-1200. -So 5% must fall outside this range on either side . Either side here means < 800 or > 1200. -Therefore, half of this 5% must have a CD4 count that is < 800/mm3 and the other half must have a CD4 count that is > 1200/mm3. -So the best answer is 2.51%. Make sure you know this for the USMLEs!
Sidebar-P Values (Statistical Significance) -P values are used to express the probability that the results of a study occur from chance events. -The lower the number, the more confident we are in the results of that test. In other words, a P value of 0.05 (5% probability of obtaining results by chance or 1 in 20) is worse than a P value of 0.01 (1% probability of obtaining results by chance or 1 in 100). -Unless you re told otherwise, use a P value of 0.05 in every NBME question.
Q12 4 separate drug trials are conducted to test the relative effectiveness of 4 different 3-beta hydroxysteroid dehydrogenase agonists in raising libido. The mean libido levels in the study (with confidence intervals) are graphed below. Which of the following statements are true?
Q12 contd. (multiple answers may be correct) a. Drug 1 is more effective than Drug 2. b. Drugs 3 and 4 are similar in effectiveness. c. Drug 4 is more effective than Drug 2. d. Drugs 1 and 4 show similar effectiveness.
Q12 Key -Statements A, B, and D are all true. -The general principle is that when 2 confidence intervals cross each other (lines overlap), there is no difference b/w those treatments. -These scenarios are unfortunately very common on the USMLEs. -Another critical way this can be tested is to give you confidence intervals (CI) of epidemiological quantities that are ratios or differences; A ratio driven qty (like relative risk) will have non-significant results if the CI crosses 1. A difference driven qty (like absolute risk reduction) will have non-significant results if the CI crosses 0. Why???
Q13 A study is done to assess the relationship between and the future need for lung transplant. The study yielded a relative risk of 3.5 with a p value < 0.05. Which of the following represents a possible 95% confidence interval from this study? vaping in college a. 0.5-3.5 b. 2-4.5 c. 3.5-6.0 d. 3.9-7.1 e. 0.71-3.68
Q13 Key -The best answer here is B. -A and E are wrong b/c the CI includes 1 but this study is measuring a relative risk (which is a ratio), so you cannot have significant results and have the CI cross 1. -A and C are wrong b/c the RR derived from the study either begins or ends the CI. This is not possible. Results obtained from a study have to be WITHIN the CI, they cannot BEGIN or END the CI. -D is wrong b/c it does not include the value obtained from the study. -Pls be absolutely sure you understand this.
Q14 A study is done to assess the effectiveness of a new drug (D) for the treatment of GBM. All patients enrolled in the study received the current standard of care (SOC). In addition to receiving SOC, Group A received drug D; Group B received SOC and a sham drug (Y). Of the 40 patients receiving D, 8 die over the course of the study. Of the 40 patients receiving Y, 20 die over the course of the study. What is the NNT for drug D? a. 2.7 b. 3.3 c. 13.3 d. 5.0
Q14 Key -The best answer here is B. -To calculate the NNT, you need to find the difference in risk b/w patients exposed to D and the patients exposed to Y (placebo). You then divide the answer obtained into 1. That s it! -Stated another way, NNT is 1/Absolute Risk Reduction. -40 people got D, 8 died (20%). 40 people got Y, 20 died (50%). The difference here is 30% (or 0.3). -Dividing this into 1 gives (1/0.3) which yields 3.3. -The NNH is a qty that has a similar calculation but follows the mantra that the rate of harm in the exposed/treatment group exceeds that in the placebo group. -To make things even easier (and only remember 1 formula), take 1/the difference in risk b/w any 2 groups given. Just always write the higher risk # first in the difference.
Sidebar-Relative Risk -To calculate relative risk, take the risk in the exposed population and divide it by the risk in the unexposed population. -For example, if a cohort study comparing smokers and non-smokers is done with 100 of 500 people in the smoking group developing lung cancer and only 50 of 500 people in the non-smoking group developing lung cancer. The RR is 20%/10% (risk of LC in smokers/risk of LC in non- smokers) which is 2. -The smokers have a 2-fold increased risk of LC compared to non- smokers.
Q15 If the presence of dysmorphic erythrocytes in the urine has a sensitivity of 90% and a specificity of 45% for the detection of IgA nephropathy, what is the likelihood ratio of having IgA nephropathy if the patient has dysmorphic erythrocytes detected on urinalysis? a. 1.35 b. 0.45 c. 4.55 d. 2.33 e. 1.67
Q15 Key -The best answer here is E. -Likelihood ratios occasionally pop up on the USMLEs. The classic worry of many students is to decipher when to use the +ve LR formula (Sensitivity/1-specificity) vs the -ve LR formula (1-sensitivity/specificity). Here s the rule; If the patient has a +ve test result, use the +ve LR formula. If the patient has a -ve test result, use the -ve LR formula. In this Q, we need to use the +ve LR (0.9/1-0.45) = 1.67
Sidebar-Likelihood Ratios -When calculated, +ve LRs tell you how much more likely a phenomenon is given a +ve test result. -When calculated, -ve LRs tell you how much less likely a phenomenon is given a -ve test result.
Q16 In a study examining the relationship b/w exposure to ketamine and the subsequent development of neutropenia, medical records of 300 children were reviewed. 100 children who were exposed to ketamine were found to have neutropenia, 50 children who were exposed to ketamine were found to not have neutropenia, 80 children who were not exposed to ketamine were found to not have neutropenia, and 70 children who were not exposed to ketamine were found to have neutropenia. What is the odds ratio for this study? a. b. c. d. 2.23 e. 7.16 3.29 2.29 5.67
Q16 Key -The best answer here is B. -Odds ratios compare the odds of a person with disease being exposed to a risk factor (RF) to the odds of controls being exposed to the same RF. -To calculate OR, take the logical people product (LGP)/weird people product (WPP). -LGP = (exposed and affected, unexposed and unaffected)/WPP(exposed and unaffected, unexposed and affected). -In this case our OR = 100 * 80/70*50 = 8000/3500 = 2.29.
Q17 The mean blood glucose level of a group of 81 medical students was 170 mg/dL with a SD of 15 mg/dL. Calculate the 95% CI and in words interpret your results.
Q17 Key Mean = 170 mg/dL. Std error of the mean = 15/sq.rt of 81 = 1.67 mg/dL. Z-score for the 95% CI = 2 (1.96 is more accurate but doesn t matter). Therefore, CI = 170 +/- (2*1.67) = 170 +/- 3.34 = 166.66-173.34 You can say with 95% confidence that the real mean BP of the medical student population falls between 166.66 and 173.34 mg/dL. Alternatively, you can say that the mean BP of any randomly selected group of 81 medical students will fall b/w 166.66 and 173.34 mg/dL 95% of the time if the same experiment is repeated on multiple occasions. HY to know the calculation and the interpretation in words!
Other HY Concepts -For ROC curves, the best test (highest combined sensitivity and specificity) lies at the upper left corner of the graph. -Cohort studies essentially involve looking at 2 groups of people with differential exposures and following them into the future for the development of some outcome. They could be prospective or retrospective. -68%, 95%, and 99.7% of a normal population lie b/w 1, 2, and 3 SDs of the mean respectively.
Other HY Concepts contd. -To compare means of 2 groups, use the T test. For > 2 groups, use the ANOVA (or F) test. -When you incorrectly reject the null, you are committing a Type 1 error (alpha error). When you incorrectly accept the null, you re committing a Type 2 error (beta error). Remember that power = 1- beta. -Tighter CIs tell you that a study is more precise. However, you should be a lot less confident in the results of the study as the CIs are too narrow (less room for error).
Other HY Concepts contd-Increasing power To increase the power of a study; -Recruit more people for a study (more closely approximates the population). -Have a large difference b/w 2 qties you re trying to measure (aka larger effect size). The power of a study comparing people with test scores of 99 and 100 as a means of comparing intelligence has less power than one comparing test scores of 25 and 100. -Have a lot of your data for a measured qty cluster around 1 value. Increasing the precision of your measurements also increases the power of a study. -Stated another way, a study that uses a P value of 0.01 has more power than one using a P value of 0.05.
Other HY Concepts contd. -The fact that something is statistically significant does not mean that it is clinically significant. A BP drug that lowers BP by 1 mm Hg from baseline even at a p value < 0.01 is a useless drug. -Mean is the average. Median represents the middle # (if you have an odd # set of data) OR the mean of the 2 middle #s (if you have an odd # set of data). Mode represents the most frequent qty in the data set. Arrange these in order before making these determinations. The mean is affected by extreme values.