Hypothesis testing

Exploring Data and setting up for an analysis

Author

Joseph Mhango

Published

2024-09-13

Objectives

-Question formulation

-Summarize: Weighing the Pig

-Variables and graphing

-“Analysis” versus “EDA”

-Statistical Analysis Plan: the concept

Question formulation and hypothesis testing

-“population of interest”

-samples and sampling

-test statistics

-null hypothesis

-Let’s talk about the P-value

Hypotheses: Null vs Alternative

  • Null Hypothesis (H₀):
    The assumption that there is no effect or no difference.
    Example: “There is no difference in test scores between two groups.”

  • Alternative Hypothesis (H₁ or Ha):
    The assumption that there is an effect or a difference.
    Example: “There is a difference in test scores between two groups.”

P-value: Definition and Interpretation

  • P-value:
    The probability of obtaining results at least as extreme as the ones observed, under the assumption that the null hypothesis is true.

  • Interpretation:
    A small p-value (typically < 0.05) suggests that the observed data is unlikely under the null hypothesis, leading to rejection of H₀.
    A large p-value suggests that the data is consistent with the null hypothesis.

Question formulation and hypothesis testing

Benefits of NHST

-Familiar and acceptable to researchers

-Typically robust to assumptions

-Strong framework for evidence

-The basic idea is simple

Question formulation and hypothesis testing

Criticism of NHST

-Often interpreted under error

-Validation of analysis often neglected

-Education often deficient

-Practitioners ignorant of subtleties

Summarize: Weighing the Pig

Chick weight dataset

The hypothesis voices “how you think the world works” or what you predict to be true”

coding

Variables and graphing

Variables and graphing

-Must convey relevant information

-Consistent in aesthetics

-Self-contained

-Reflect hypothesis (unless descriptive)

-Appropriate to data

Variables and graphing

Concept of “layering” in building graphs

coding

“Analysis” versus “EDA”

EDA:

-Informal, haphazard

-Gain data understanding

-Test assumptions

-Usually not for “others”

-Usually occurs before analysis

“Analysis” versus “EDA”

Analysis:

-Designed to fit hypothesis

-For presentation to others

-Creation of EVIDENCE to support CLAIMS

-Reproducible

Statistical Analysis Plan: the concept

Prior to data collection

-formally state hypothesis

-State specific statistical model(s)

-Specify data and data collection

-State and justify sample size

Taught to children

Best practice

Practice Exercises