Sample Size Calculations - Examples
Hello folks and welcome back to my regular blog update. Today I would like to finish off the series of introduction posts on sample size calculation approaches by showing some simple examples and illustrating how to use the formulae described in previous posts to address these problems. I apologise for the lack of theoretical discussion for today’s post but, unfortunately, it is a very busy period which prevents me from opening up the discussion of a new topic. I promise this will be the focus for my next post. In the meantime, I hope you can still find these exercises useful! Enjoy!
Examples
Example 1
Let us consider a new trial which is planned to investigate the performance of a new weight loss programme, where participants are randomised either to the new programme or to a standard of care group. The primary outcome of the trial is weight (Kg) at six months follow-up and will be analysed using a t-test at \(5\%\) significance level. From a pilot work the researchers estimate the standard deviation of weight to be \(6\) Kg and hope to show that the programme results in an average weight loss of \(2\) Kg.
- How many participants must be recruited to ensure a power of \(80\%\) ?
Given that the primary outcome (weight in Kg) is continuous and average differences between the two intervention groups in this outcome at one time point will be analysed using a t-test, we can use the standard sample size calculation formula to estimate the required sample size in each group \(n\):
\[ n = \frac{2\times \sigma^2(z_{1-\alpha}+z_{1-\beta})^2}{(\mu_2-\mu_1)^2} = \frac{2 \times 6^2 \times (1.64 + 0.84)^2}{2^2} = 110.7072 = \approx 111, \tag{1}\]
which results in a total sample size of \(N=111\times 2 = 222\).
- It is anticipated that \(25\%\) of subjects will drop out of the study, and that \(20\%\) of the remaining subjects (on the active arm) will not follow the weight loss programme. Adjust the required sample size obtained before.
To adjust the sample size calculation for the expected dropout and non-adherence proportions, we need to multiply the final estimate by the appropriate inflation factors. For dropout, the inflation factor is \(IF_d=\frac{1}{1-d}=\frac{1}{1-0.25}\), which leads to an adjusted total sample size of \(N^{\star}=N \times \frac{1}{1-d}=221.4144 \times \frac{1}{1-0.25}=295.2192 \approx 295\). For non-adherence, the inflation factor is \(IF_{nc}=1/(1-NC_1-NC_2)^2=1/(1-0.2-0)\), which leads to a final adjusted total sample size of \(N^{\star\star}=N^{\star} \times 1/(1-NC_1-NC_2)^2=295.2192 \times 1/(1-0.2)^2=461.28 \approx 461\)
- Suppose that the new programme only allows to recruit up to \(200\) participants. What power will the study have for this number of subjects (ignoring dropout and non-adherence)?
If the total number of subjects is fixed and equal to \(N=200\) (so that in each group we have \(n=100\)), then we can simply fill in this value in Equation 1 and solve for \(z_{1-\beta}\) and so also for \(1-\beta\). In our case, the calculation should given \(1-\beta \approx 0.76\).
- To reduce the costs of the study the researchers plan to randomise using a \(1:3\) ratio (intervention:control). Adjust the required sample size accordingly.
To adjust the sample size using the proposed ratio \(r=3\), we need to first derive the adjusted sample size in the intervention group as \(n_1=\frac{(r+1)n}{2r}=\frac{(3+1)\times `n1`}{2\times 3} \approx 73.8048\), and the obtain the sample in the other group as \(n_2=r \times n_1 = 3 \times ` n1_ratio` \approx 221.4144\).
Example 2
A multi-national clinical trial investigates the value of gradually increasing dose schedule of a beta blocker in the treatment of severe heart failure. The trial will be randomised, double blind and placebo controlled, where each patient is followed for \(2\) years and the main treatment comparison is for all cause mortality. Previous studies suggest a \(2\) year mortality of around \(36\%\), and the researchers believe that a one third reduction in mortality would be important to detect at a significance level of \(5\%\), power of \(90\%\) and an equal number of patients in each group. The trial is to be analysed via a comparison of proportions.
- How many patient should be recruited?
Given that the outcome of interest (all cause mortality) is binary and proportion differences between the two intervention groups in this outcome at one time point will be analysed, we can use the dedicated sample size calculation formula to estimate the required sample size in each group \(n\):
\[ n = \frac{(z_{1-\alpha}\sqrt{2\bar{\pi}(1-\bar{\pi})}+z_{1-\beta}\sqrt{\pi_1(1-\pi_1)+\pi_2(1-\pi_2)})^2}{(\pi_2-\pi_1)^2} = \frac{(1.64\sqrt{20.42(1-0.42)}+1.28\sqrt{0.36(1-0.36)+0.48(1-0.36)})^2}{(0.48-0.36)^2} \approx 287, \tag{2}\]
which results in a total sample size of \(N=287\times 2 = 574\). Alternatively, the following approximation formula may also be used instead:
\[ n = \frac{2\times(z_{1-\alpha}+z_{1-\beta})^2}{\Delta^2} = \frac{2\times(1.64+1.28)^2}{0.2431323^2} \approx 288, \] where \(\Delta = \frac{(\pi_2-\pi_1)}{\sqrt{(\bar{\pi}(1-\bar{\pi}))}}\).
- How much the sample size changes if the power is reduced to \(80\%\)?
We just need to update Equation 2 using \(1-\beta=0.8\) (and so \(z_{1-1-\beta}=0.84\)), which should give \(n^{\star}=207\) and a total sample size of \(N=n^{\star} \times 2 = 414\).
- How much the sample size changes if the expected reduction in mortality is \(6\%\) (i.e. \(36\%\) vs \(30\%\))?
Again, we update Equation 2 using \(\pi_1=0.30\) and \(\pi_2=0.36\) (and so \(\bar{\pi}=0.33\)), which should give \(n^{\star\star}=1045\) and a total sample size of \(N=n^{\star\star} \times 2 = 2091\).
Example 3
A trial involving patients with bedsores is planned under the hypothesis that ultrasound treatment will halve the healing time of these sores with respect to standard treatment. The investigators would like to test this difference using a \(5\%\) significance level and a \(90\%\) power.
- How many participants should be recruited assuming that all people will be followed until they are healed?
Given that the outcome of interest (time to heal) is survival and differences between the two intervention groups in this outcome at one time point will be analysed, we can use the dedicated sample size calculation formula to estimate the required sample size in each group \(n\) in terms of number of events (i.e. failures):
\[ n_e=(z_{1-\alpha/2}+z_{1-\beta})^2\times \Bigg( \frac{\text{HR}_{\text{exp}}+1}{\text{HR}_{\text{exp}}-1} \Bigg)^2 = (1.64 + 1.28)^2 \times \Bigg(\frac{2+1}{2-1}\Bigg)^2 \approx 77 , \tag{3}\]
which corresponds to the sample size in each group assuming no censoring with an anticipated hazard ratio of \(\text{HR}_{\text{exp}}=2\).
- If participants can only be followed up to \(21\) days, with an expected healing rate of \(70\%\) within this time, then how does this affect the required sample size?
If censoring occurs, and the survival proportions in the two groups can be estimated, then we can these to update the estimate for \(\text{HR}_{\text{exp}}=\frac{\log \pi_1}{\log \pi_2}=\frac{-1.89712}{-1.2039728} \approx 1.58\), which can then be used in Equation 3 to derive the expected number of events in each group:
\[ n_e= (1.64 + 1.28)^2 \times \Bigg(\frac{1.58+1}{1.58-1}\Bigg)^2 \approx 171, \]
as well as an estimate of the sample size in each group (assuming censoring) via:
\[ n=\frac{(z_{1-\alpha/2}+z_{1-\beta})^2}{(2-\pi_1-\pi_2)}\times \Bigg( \frac{\text{HR}_{\text{exp}}+1}{\text{HR}_{\text{exp}}-1} \Bigg)^2 = \frac{(1.64 + 1.28)^2}{(2-0.15 - 0.3)} \times \Bigg(\frac{1.58+1}{1.58-1}\Bigg)^2 \approx 110, \]
for a total sample size of \(N=n\times 2=220\).
Example 4
A study is carried out to estimate the mean latency of the auditory P300 in schizophrenic patients. A \(95\%\) confidence interval of width less than \(20\) ms is required and a previous study suggests a standard deviation of the measurements of \(27\) ms.
- How many patients are required?
Given that the primary outcome (auditory P300 in ms) is continuous and a desired value for the width of a \(95\%\) confidence interval around the population mean is known, we can use the precision-based sample size calculation formula to estimate the required total sample size:
\[ N = 4 \times z^2_{1-\alpha/2} \times \frac{\sigma^2_{\text{exp}}}{w^2_{\text{exp}}} = 4 \times 1.96^2 \times \frac{27^2}{20^2} \approx 28. \]
Example 5
Researchers plan to conduct a case-control study to investigate the effectiveness of an existing flu vaccine in preventing swine flu, where cases are defined as subjects admitted to hospital with a diagnosis of swine flu whereas controls as subjects admitted with a respiratory disease other than influenza. They estimate that \(30\%\) of the controls would have had the flu vaccine prior to admission, with the aim of the study being to detect an odds ratio of \(0.5\) with \(80\%\) power assuming a \(5\%\) significance level. It is hypothesised that the odds of a person contracting swine flu is \(50\%\) lower if they have had a flu vaccine, and that the odds of a case having had the vaccine should be \(50\%\) lower than that of a control.
- What percentage of cases is estimated to have had the vaccine prior to admission if the odds ratio is \(0.5\)?
Given that we are dealing with a case-control study, and that information on the proportion of controls who had the vaccine prior to admission and assumed odds ratio is available (\(\pi_0=0.3\) and \(\text{OR}=0.5\)), then we can estimate the corresponding proportion of cases who had the vaccine prior to admission as:
\[ \pi_1=\frac{\pi_0 \times OR}{1+\pi_0\times(OR-1)} = \frac{0.3 \times 0.5}{1+0.3\times(0.5-1)} \approx 0.18. \]
- What sample size is required for the study?
Using the estimate of \(\pi_1\) retrieved before, we can now use standard sample size calculation formula for two proportions to derive an estimate of the sample size in each group:
\[ n = \frac{(z_{1-\alpha/2}\times \sqrt{2\times\bar{\pi}\times(1-\bar{\pi})} + z_{1-\beta} \times \sqrt{\pi_1\times (1-\pi_1)+\pi_0\times(1-\pi_0)})^2}{(\pi_1-\pi_0)^2} = \frac{(1.96\times \sqrt{2\times0.2382353\times(1-0.2382353)} + 1.28 \times \sqrt{0.1764706\times (1-0.1764706)+0.3\times(1-0.3)})^2}{(0.1764706-0.3)^2} \approx 248, \]
which leads to a total sample size of \(N=n\times 2= 495\).
Conclusion
This concludes the topic of the sample size calculation, at least for these few introductory examples and formulae. I hope to resume this topic later on (perhaps those specifically related to cost-effectiveness analysis) but, for now, let me stop here. I hope some of the information I shared was useful and perhaps interesting.
See you at the next update!