Intro to QK2E

Topics in Scientific and Statistical Computing

Robin Elahi

Course overview

1 or 2 presenters per week

Expectations (everyone)

  • Read the chapter

  • Work through the suggested examples in R

Expectations (presenter)

  • lead a discussion of the reading

  • prepare a group / break-out activity

  • be creative and focus on what you want, in the context of the chapter

  • upload relevant materials to Canvas / Gdrive

QK2E

Sign up to lead a week

Link to Google sign up is on Canvas


Programming challenge (should you choose to accept it):

If you are going to use slides, create them in Quarto (.qmd) or Rmarkdown (.Rmd)

Totally optional. In any case, please share your ppt slides, R code, other relevant materials with the group in the Gdrive.

Prerequisites

You have some familiarity with R computing and statistics.

Do you know what all this means?

x <- c(2, 4, 3, 6)
y <- c(5, 12, 4, 10, 2)
t.test(x, y)

    Welch Two Sample t-test

data:  x and y
t = -1.3761, df = 5.4988, p-value = 0.2222
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -8.031728  2.331728
sample estimates:
mean of x mean of y 
     3.75      6.60 

Some core statistical concepts

Statistics vs parameters

A statistic is

a numerical description of a sample

A parameter is

a numerical attribute of a population

Often, statistics are used to estimate parameters.

The two heads of classical statistics

  • estimating parameters, with uncertainty (confidence intervals)

  • evaluating (in-)consistency with a particular situation (\(p\)-values)

  • What do these data tell us about the world?

  • How strongly do we believe it?

Lurking, behind everything:

is uncertainty, thanks to:

  • actual differences of biological interest (process uncertainty)
  • uninteresting differences due to sampling variation and measurement error (observation uncertainty)

How do we understand uncertainty, concretely and quantitatively?

  • with models.

Break

Stand up! Stretch! Get a drink, use the restroom.

Then, with a partner(s), go to a board and discuss the following:

  • What is hypothesis testing?

  • What is a p-value?

Data story

Low et al (2016) examined the effects of two different anesthetics on aspects of the physiology of the mouse. Twelve mice were anesthetized with isoflurane and eleven mice were anesthetized with alpha chloralose and blood CO2 levels were recorded after 120 minutes. The H0 was that there was no difference between the anesthetics in the mean blood CO2 level. This is an independent comparison because individual mice were only given one of the two anesthetics.

R

library(tidyverse)
library(car)

theme_set(theme_bw(base_size = 16) + 
            theme(panel.grid.minor = element_blank(), 
                  strip.background = element_blank()))

The data

Describe what is happening in these lines of code.

low <- read.csv("data/lowco2.csv")

 

names(low)
[1] "anesth" "co2"   


dim(low)
[1] 23  2


str(low)
'data.frame':   23 obs. of  2 variables:
 $ anesth: chr  "iso" "iso" "iso" "iso" ...
 $ co2   : int  43 35 50 39 56 54 39 51 49 54 ...

Visualize data

low %>% 
  ggplot(aes(anesth, co2)) + 
  geom_point(alpha = 0.5, size = 5) + 
  labs(x = "Anesthetic", y = "CO2") + 
  theme_bw(base_size = 24)

Summarizing data: point estimates and variability

low %>%  
  group_by(anesth) %>% 
  summarise(n = n(), 
            mean = mean(co2),
            median = median(co2),
            sd = sd(co2), 
            variance = var(co2), 
            se = sd / sqrt(n)
            )
# A tibble: 2 × 7
  anesth     n  mean median    sd variance    se
  <chr>  <int> <dbl>  <dbl> <dbl>    <dbl> <dbl>
1 ac        11  70.9   64    20.2     408.  6.09
2 iso       12  50     50.5  11.4     130.  3.29

Confidence intervals

low %>%  
  group_by(anesth) %>% 
  summarise(n = n(), 
            mean = mean(co2),
            sd = sd(co2), 
            se = sd / sqrt(n), 
            CI_upper = mean + se * qt(p = 0.975, df = n-1), 
            CI_lower = mean + se * qt(p = 0.025, df = n-1), 
            CI = se * qt(p = 0.975, df = n-1), 
            upper = mean + CI, 
            lower = mean - CI
            )
# A tibble: 2 × 10
  anesth     n  mean    sd    se CI_upper CI_lower    CI upper lower
  <chr>  <int> <dbl> <dbl> <dbl>    <dbl>    <dbl> <dbl> <dbl> <dbl>
1 ac        11  70.9  20.2  6.09     84.5     57.3 13.6   84.5  57.3
2 iso       12  50    11.4  3.29     57.2     42.8  7.24  57.2  42.8

In a frequentist world, parameters are fixed (but unknowable). Interpret the CI in this context.

Hypothesis testing

A. Construct a null hypothesis (HO)

B. Derive a test statistic from the data

C. Compare the obtained test statistic to one derived from values obtained under the HO.

A \(p\)-value is

the probability of seeing a result at least as surprising as what was observed in the data, if the null hypothesis is true.

Usually, this means

  • a result - numerical value of a statistic
  • surprising - big
  • null hypothesis - the model we use to calculate the \(p\)-value

which can all be defined to suit the situation.

What does a small \(p\)-value mean?

If the null hypothesis was true, then you’d be really unlikely to see something like what you actually did.


So, either the “null hypothesis” is not a good description of reality or something surprising happened.


How useful this is depends on the null hypothesis.

T-test: equal variances

t.test(co2 ~ anesth, var.equal = TRUE, data = low)

    Two Sample t-test

data:  co2 by anesth
t = 3.0927, df = 21, p-value = 0.005515
alternative hypothesis: true difference in means between group ac and group iso is not equal to 0
95 percent confidence interval:
  6.849172 34.969010
sample estimates:
 mean in group ac mean in group iso 
         70.90909          50.00000 


Interpret this result.

Your turn

With a partner, go to the website, check out the source code, and work though the Chapter 2 notes together.