# Statistical Methods in Medical Research
# Laaketieteellisen tutkimuksen tilastolliset menetelmat
# At University of Helsinki
# 16.8.2023
# Matti Pirinen
###
### 5. Learn R: For-loop, packages, random number generator
###
#******************************
#
# 5.1 FOR-loop
#
#******************************
# For-loop runs through a set of values and repeats a given task for each value, one by one.
# For example, if we want that a variable x gets values 1,2,3,...,10 sequentially,
# we can make that happen in R by stating 'for(x in 1:10)' and then add a code block
# that will be run for each value of 'x' within symbols '{ }'
for(x in 1:10){ #x is a variable that runs through the sequence 1,...,10, one value at a time
txt = paste("index =",x, "and its square is",x^2)
print(txt)
}
# If we use paste in for-loop, we need to explicitly print() within the for-loop
# since otherwise the output of paste( ) doesn't appear on screen.
# The values in the vector to run through can be also something else than integers
vals = c(1, 10.2, 100.99, 1000.12) # make a vector of values
for(ii in vals){ # index can have any name. ii, jj, kk are common names, above it was 'x'.
if(ii <= 100) {print(paste("Index is still small:",ii))}
if(ii > 100) {print(paste("index =",ii, "and its square is",ii^2))}
}
# Note that after the for-loop, the index variable keeps its last value.
ii
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
# Test Yourself 5.1. (Answers are at the end of this file.)
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
#(1) Write a for-loop that writes 1/x on screen where
# x goes through values from 2 to 5.
#
#(2) Generate a vector 'x' of length 10 that is full of value 'NA' initially.
# Write a for-loop that fills 'x' by setting x[i] = i+1 for all i = 1,...,10.
# Show the value of x after the loop has finished.
#*****************************
#
# 5.2 Install Packages in R
#
#*****************************
# To install package "pwr" for power calculations in Lecture 5, type
install.packages("pwr")
# and choose any R-site from the menu to download the package.
# It should install it automatically but it may ask you the directory where to put it.
# You can accept the default location.
# Sometimes the install.package() command doesn't go through because there is some other package
# that is needed first and R doesn't install it by default.
# In that case, you can add 'dependencies = TRUE' to install all the dependencies
install.packages("pwr", dependencies = TRUE) #likely not needed for pwr, but here used as an example
#*******************************
#
# 5.3 Random number generation
#
#*******************************
# We have used functions rbinom() and rnorm() to generate samples from binomial and Normal distributions.
# But how is "randomness" generated with a computer?
# Our "random" samples are actually not truly random but completely deterministically
# generated values. They just behave so much as true random samples from these distributions
# that we can't tell a difference between those values and true randomness.
# The random number generation happens so that we fix a seed number for the random number generator (RNG)
# by 'set.seed()' command
set.seed(85) # this can be any value, here it is 85
# When we now ask for a new random value from, say Uniform distribution, R applies certain deterministic
# formula to the current state of the random number generator (here 85) and the generator gets a new state
runif(1) # this produces a "random" value from uniform and also changes the state of RNG
# next application of runif() will then produce a different value and again change the state of RNG and so on
runif(2) #Let's ask for 2 more uniform values
#If we ask for 3 more values, they look again "random" compared to previous ones
runif(3)
# But if we would now change the seed back to 85, we would get exactly the same 6 "random"
# values we have seen above and in the same order.
set.seed(85)
runif(6)
# Thus, if you set the seed when you start working on some problem
# you can reproduce your results exactly, even when they involve "random" numbers.
# This is a useful property when trying to isolate the problems in the code,
# or when needing to produce reproducible results with methods that involve random sampling.
# In the lecture notes of this course, there is a set.seed() command in the beginning of each lecture
# so that the results will look the same every time the document is compiled.
# Otherwise, e.g., the means and sds of the sampled data sets varied between runs
# and it would become more complicated to write about the results in the lecture notes.
# If you open up R but don't set the seed yourself, then
# a new one is created from the current time and the process ID of your R session.
# Hence, by default, different sessions will give different simulation results
# and the variation between the results from different R sessions
# mimics true randomness (as far as we can tell).
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
# Test Yourself 5.3. (Answers are at the end of this file.)
#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#
#(1) Set the seed of the random number generator to value '14'.
#(2) Generate 50 random values from the standard Normal distribution N(0,1)
# and compute the standard deviation of this sample.
#(3) Repeat part (2) without setting the seed to a new value.
#(4) Set seed again to value '14' and repeat part (2).
# Compare the three estimates of SD from a sample of n = 50.
#
##
### ANSWERS
##
#
#*#*#*#*#*#*#*#*#*#*#
# Test Yourself D5.1.
#*#*#*#*#*#*#*#*#*#*#
#(1) Write a for-loop that writes 1/x on screen where
# x goes through values from 2 to 5.
for(x in 2:5){
print(1/x) #print is enough here, no need for paste, although it wouldn't harm either
}
#(2) Generate a vector 'x' of length 10 that is full of value 'NA' initially.
# Write a for loop that fills 'x' by setting x[i] = i+1 for all i = 1,...,10.
x = rep(NA, 10)
for(i in 1:10){
x[i] = i+1
}
x #show x after the for loop
#*#*#*#*#*#*#*#*#*#*#
# Test Yourself D5.3.
#*#*#*#*#*#*#*#*#*#*#
#(1) Set the seed of the random number generator to value '14'.
set.seed(14)
#(2) Generate 50 random values from the standard Normal distribution N(0,1)
# and compute the standard deviation of this sample.
sd(rnorm(50,0,1))
#(3) Repeat part (2) without setting the seed to a new value.
sd(rnorm(50,0,1))
#(4) Set seed again to value '14' and repeat part (2).
# Compare the three estimates of SD from a sample of n = 50.
set.seed(14)
sd(rnorm(50,0,1))
# We see that the SD estimate from the first data set is the same as from the 3rd data set
# but different from the data set 2. This is because the 1st and the 3rd data sets were
# generated starting from the same state of the random number generator and hence
# the same rnorm() command produced exactly the same data sets in both cases.
# The 2nd data set was generated from a different state of the random number generator
# and hence it was a different data set from the other two sets.