Basic medical statistics exercises

Quarto
R
Academia
Medical Statistics
Exercises
Published

September 10, 2025

Hello everybody and welcome back to my blog. Since I have just come back from my summer break I have many things to catch up to, which means that today’s post will be a bit shorter than usual. Thus, I will take advantage of the fact that in my last post I have introduced the topic of medical statistics and some popular statistical measures in the field and discuss here some numeric exercises about those measures. You can think of these exercises as a sort of way to test your knowledge and see whether the computation of the measures is clear to you. If you are lost at any point, I refer you back to my previous post for a brief review of the theoretical concepts! Without further ado, let’s dive into these exercises. If you are interested in these concepts, try first to answers the problems without looking at the results and then check if you did it right. This way you will be sure whether you fully understand the topic!

Exercises

Problem 1

As a hypothetical example, consider a group of \(30,000\) initially disease-free individuals observed for \(10\) years, after which \(50\) were diseased. The information is summarised in Table 1.

Table 1
              Exposed - Yes Exposed - No Total
Disease - Yes            30           20    50
Disease - No           9970        19980 29950
Total                 10000        20000 30000
  1. Calculate the overall risk of disease in this group of subjects

  2. Calculate the risk ratio for exposed Vs unexposed individuals with \(95\%\) confidence intervals

  3. Calculate the odds ratio for exposed Vs unexposed individuals and the \(95\%\) confidence intervals

  4. Interpret the results

  5. What additional information do you need in order to calculate the disease rates in each group?

Problem 2

In a group of \(50000\) individuals, \(20\%\) are exposed to a hazardous agent. Data on disease status obtained for exposed and unexposed individuals are shown in Table 2.

Table 2
              Exposed - Yes Exposed - No Total
Disease - Yes          3000         5000  8000
Disease - No           7000        35000 42000
Total                 10000        40000 50000

The risk of disease in the exposed and unexposed groups are \(0.3\) and \(0.125\) respectively. The overall risk of the disease is \(0.16\), the risk ratio is \(2.4 (95\% CI: 2.31;2.50)\), and the odds ratio is \(3 (95\% CI: 2.85;3.16)\).

  1. How good is the agreement between the risk ratio and the odds ratio compared to that in Section 1.1?

  2. Calculate the risk difference with \(95\%\) CI

  3. Interpret the results

Problem 3

A group of \(15\) patients suffering from brain cancer was followed for a period of \(500\) days to observe whether they suffered a relapse or not. Three patients did not suffer a relapse during the follow-up time. Data on whether a patient suffered a relapse or not and the follow-up time for each patient are shown in Table 3.

Table 3
   Subject Relapse Time to relapse (days)
1        1     Yes                    355
2        2      No                    500
3        3      No                    500
4        4     Yes                    368
5        5     Yes                    228
6        6      No                    500
7        7     Yes                    302
8        8     Yes                     62
9        9     Yes                    271
10      10     Yes                     28
11      11     Yes                    183
12      12     Yes                    314
13      13     Yes                     40
14      14     Yes                    151
15      15     Yes                    105
  1. Calculate the rate of relapse

  2. Interpret the results

Solutions

Problem 1

  1. The overall risk is computed as: \(\text{Risk}=\frac{50}{30000}=0.0017\)

  2. The risk ratio is computed as: \(\text{RR}=\frac{30}{10000}/\frac{20}{20000}=3\) while the corresponding \(95\%\) CI is calculated by taking the exponential of the limits: \(\log \text{RR}\pm 1.96 \times \text{SE} \log \text{RR}\), where \(\text{SE} \log \text{RR}=\sqrt{[1/30+1/(10000)+1/20+1/(20000)]}=0.2889\). Generating the limits and taking their exponential, we have: \(95\% \text{CI}=[1.70;5.28]\)

  3. The odds ratio is computed as: \(\text{OR}=\frac{30}{9970}/\frac{20}{19980}=3.01\) while the corresponding \(95\%\) CI is calculated by taking the exponential of the limits: \(\log \text{OR}\pm 1.96 \times \text{SE} \log \text{OR}\), where \(\text{SE} \log \text{OR}=\sqrt{[1/30+1/20+1/9970+1/19980]}=0.2889\). Generating the limits and taking their exponential, we have: \(95\% \text{CI}=[1.71;5.30]\)

  4. Both odds and risk are higher in the exposed group, i.e. they are about three times greater than in the unexposed group. The CIs do not include \(1\) suggesting the risk is significantly greater in the exposed group

  5. To calculate disease rates we need individual follow-up times

Problem 2

  1. The agreement between the OR and RR is better in Section 1.1 because the disease is rarer, i.e. \(0.17\%\) Vs \(16\%\)

  2. The risk difference is computed as: \(\text{RD}=0.3-0.125=0.175\) while the corresponding \(95\%\) CI is calculated as: \(\text{RD}\pm 1.96 \times \text{SE RD}\), where \(\text{SE RD}=\sqrt{0.3\times (1-0.3)/10000 + 0.125\times (1-0.125)/40000}=0.00487\). Thus we have: \(95\% \text{CI}=[0.165;0.185]\)

  3. \(17.5\) excess events per \(100\) people in the exposed group compared to \(100\) people in the unexposed group

Problem 3

  1. Given that we have a total of \(12\) subjects who experienced the event (death) and a total follow-up time of \(3907\) days, we can compute the rate of relapse as: \(\text{Rate}=\frac{12}{3907}=0.0031\) events per day

  2. We expect \(1.12\) events per year

So, what do you think? were you able to answers the problems without looking at the solutions? if yes, then you should have grasped a really good understanding of the topic!

Before concluding the post (and going back to preparing my educational material), I wanted to mention that I will attend Bayes Conference 2025, which this year will be held in Leiden (NL). To be honest, I think I only attended this conference once as a PhD student so I am not really sure what to expect. I know that the conference is a bit industry-oriented with many representative of pharmaceuticals or consultancy companies in the field. However, I also know of a few academics attending it and I look forward to give the presentation about my most recent work (hopefully published soon!). The conference will be held next month, between October 22-24 and is preceded by a workshop on Bayesian statistics in prediction modelling, and is hosted in a “nearby” Dutch city. This makes things a bit easier for me to attend it since it will placed during full teaching period. I will only need to take a couple of leave days and do not need to ask any colleagues to replace me in my teaching. I am looking forward to attend the conference and perhaps met a couple of people I still remember, which is always the best part of any conference.

Well, that’s all for me for today and, as it is common to say, I will see you next time!