---
title: "Home Exercises 2"
author: "Your Name"
date: "29.9.2025"
output:
  pdf_document: default
  html_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Write your name at the beginning of the file as "author:".

1. Return to Moodle by **9.00am, Mon 29.9.** (to section "BEFORE").
2. Watch the exercise session video available in Moodle by **10.00am, Mon 29.9.**
3. If you observe during the exercise session that your answers need some correction, 
return a corrected version to Moodle (to section "AFTER") by **9.00 am, Mon 6.10.**


#### Problem 1.
Suppose that you treat $n=97$ patients with a new treatment
and 70 patients benefit. 
You know that the old treatment helped about 70% of the patients.
Your null hypothesis is that also the new treatment has success probability of 70%.

(a) Plot the null distribution Bin(97, 0.70) using
continuous line connecting the probability values for range 0 to 97
successes. Mark the observed value of $x=70$ by a red line.
(See Example 2.3. from lecture notes.)
What do you conclude about how consistent the observation is with the null hypothesis by simply looking at the plot?
Would you expect to have a small *P*-value in these data?


(b) Compute a right-hand tail probability of observing at least 70 
successes under the null hypothesis.
How could you use this value to approximate the **2-sided** *P*-value?
(See Example 2.3 from lecture notes.) 


(c) Compute the exact two-sided 
*P*-value under the null hypothesis for your observation using `binom.test()`.


(d) Do you think that the new treatment may be more efficient than the old one?


#### Problem 2.

Let's move back in time to the beginning of COVID-pandemic.
In January 2020, Lancet published this article:

Huang et al. (2020) Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395: 497-506.
<https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30183-5/fulltext>

It described 41 confirmed COVID-cases from Wuhan, China.
This small study gives
good examples to study Fisher's exact test since these small 
counts are too small for many other tests.
In the following questions, we compare properties of 
patients taken to intensive care unit (ICU) and those outside ICU. 


(a) 
Among ICU patients, 11 were men and 2 women, while among non-ICU patients, 
19 were men and 9 were women. 
Use Fisher's test to evaluate whether there is a statistical 
association between ICU and sex. What is the *P*-value?


(b) Among 13 patients in ICU, 11 had acute respiratory distress syndrome while
among 28 patients in non-ICU 2 had it. Is there an association
between ICU and respiratory syndrome at significance level 0.001?
(NOTE: Be careful to put the correct numbers into data matrix. They are not all
directly given above.)


(c) Among 13 patients in ICU, 5 died while 
among 28 patients in non-ICU 1 died. Is there an association
between ICU and death at significance level 0.01?


#### Problem 3. 

In Finland, on average, 8.1% of babies are born in April.
Suppose that we have checked the month of birth for $N = 5000$ MS-disease patients
and observed that $A = 485$ of them were born in April.
Our interest is whether being born in April is a risk factor for MS-disease.

(a) Is the proportion of MS-patients who were born in April larger or smaller than
the population frequency of being born in April?


(b) Suppose as a null hypothesis that MS-disease patients were just 
as probable to be born in April as general population. 
Then the value $A$ that we have observed would come from the
distribution Bin(5000, 0.081) (5000 trials and success probability is 0.081 for each trial).
Generate 10000 experiments where each takes a sample of `size=5000` individuals 
from binomial distribution with `prob=0.081`. Generate histogram of the results.
(See use of `rbinom()` from Lecture 1.)


(c) Look visually where the observed value $A=485$ would lie in the histogram of part (b).
Does it seem a plausible value according to Bin(5000, 0.081) distribution?
Or does it seem to be smaller or larger than one might expect if Bin(5000, 0.081)
held true for MS-patients?


(d) Compute probability that a value from Bin(5000, 0.081) is 
at least 485 using `pbinom()`?


(e) Do a binomial test to compute a (two-sided) 
*P*-value for observation 485 successes out of 5000 trials
with success probability of 0.081.


#### Problem 4.
Let's try to find an intuitive way to quantify how "surprising" 
any given *P*-value is by utilizing coin flipping experiments.

Let's think about experiment of flipping $n$ coins. The null hypothesis is
that the coins are fair (success probability of heads is 50%).
Let's think that we observed all $n$ coins landing heads up.
If there is only one coin, then observing "heads" (rather than "tails") is not surprising at all 
(probability 1/2 = 50%).
If there are two coins and both land heads up, 
that's not very surprising either (probability 1/4 = 0.25). 
But if ten coins all land heads up, most of us would start
doubt the null hypothesis of fair coins since probability
of this event is only 1/1024 or 0.00098. 
On the other hand, we can now interpret that observing, for example, 
*P*-values of 0.5, 0.25 or 0.00098 correspond to a similar amount of surprise
under the null hypothesis, than observing 1,2 or 10 coins landing heads up when we
assumed that they were all fair coins.

Probability of observing only heads from $n$ coins is $1/2^n$. If we
want to match our observed *P*-value $P$ to the number of coins $n$ that would
correspond to the same amount of surprise when they all have landed heads up,
we have equation $P = 1/2^n$ or equivalently $n = -\log_2(P)$.
Thus, for any *P*-value `P`, the corresponding number of coins landing heads up 
can be computed in R by the command `-log2(P)`.
This quantity is also called an *S*-value or surprise value.


(a) The most common significance threshold used in medical literature is 0.05.
What is the *S*-value corresponding to the *P*-value 0.05?
Do you feel that *P*-value of 0.05 is very surprising under the null hypothesis
based on the *S*-value?


(b) In genome-wide scans of genetic risk variants, 
a significance threshold of 5e-8 is used. What is the corresponding
*S*-value and how surprising such an observation feels to you?


(c) In Problem 2, we encountered a *P*-value of 1.694e-06. What is the
corresponding *S*-value and how surprising this *P*-value feels to you?