---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
kernelspec:
  display_name: Python 3
  name: python3
---

```{code-cell} python
:tags: [remove-input]
import matplotlib
matplotlib.use("Agg")
import plotly.io as pio
pio.renderers.default = "png"
```

# Hypothesis testing

Hypothesis tests follow the same grammar as confidence intervals, with an added
`hypothesize()` step that defines the null world, and `generate()` that simulates
from it.

## Two groups: a permutation test

Are tracks more likely to be popular in *metal* than in *deep house*? Compare the
two genres' "popular" rates, then permute the genre labels to build the null.

```{code-cell} python
import moderndive as md
from moderndive import specify, observe, get_p_value, visualize, shade_p_value

spotify = md.load_spotify_metal_deephouse()

# Observed difference in "popular" proportions, metal − deep-house
obs = observe(
    spotify, formula="popular_or_not ~ track_genre", success="popular",
    stat="diff in props", order=("metal", "deep-house"),
)

# Null: genre is independent of popularity → permute the labels
null = (
    spotify.specify(formula="popular_or_not ~ track_genre", success="popular")
    .hypothesize(null="independence")
    .generate(reps=1000, type="permute", seed=76)
    .calculate(stat="diff in props", order=("metal", "deep-house"))
)

get_p_value(null, obs_stat=obs, direction="right")
```

## Shade the p-value

```{code-cell} python
visualize(null) + shade_p_value(obs_stat=obs, direction="right")
```

`direction` is one of `"right"`/`"greater"`, `"left"`/`"less"`, or `"two-sided"`.
The two-sided shading mirrors the observed statistic about 0.

## One mean / one proportion (point null)

Use a `"point"` null with bootstrap resampling, supplying the hypothesized value:

```{code-cell} python
age = md.load_age_at_marriage()

obs_t = observe(age, response="age", stat="t", null="point", mu=23)

null_t = (
    age.specify(response="age")
    .hypothesize(null="point", mu=23)
    .generate(reps=1000, type="bootstrap", seed=1)
    .calculate(stat="t")
)
get_p_value(null_t, obs_stat=obs_t, direction="two-sided")
```

For a one-proportion test you can also *simulate* draws directly:

```{code-cell} python
import polars as pl

coins = pl.DataFrame({"flip": ["heads"] * 30 + ["tails"] * 70})

null_p = (
    coins.specify(response="flip", success="heads")
    .hypothesize(null="point", p=0.5)
    .generate(reps=1000, type="draw", seed=1)   # "simulate" is an alias
    .calculate(stat="prop")
)
```

## Available statistics

`calculate(stat=...)` supports the full infer vocabulary: `"mean"`, `"median"`,
`"sum"`, `"sd"`, `"prop"`, `"count"`, `"diff in means"`, `"diff in medians"`,
`"diff in props"`, `"ratio of means"`, `"ratio of props"`, `"odds ratio"`,
`"slope"`, `"correlation"`, `"t"`, `"z"`, `"F"`, `"Chisq"`.

## Custom test statistics

Beyond those strings, `stat=` accepts **any function** that takes the response
(and explanatory) arrays and returns a single number — so you can infer about a
statistic that isn't built in. Here we bootstrap the interquartile range of
almond weights and read off a 95% interval:

```{code-cell} python
import numpy as np
from moderndive import get_confidence_interval

def iqr(response, explanatory):
    return float(np.percentile(response, 75) - np.percentile(response, 25))

boot_iqr = (
    md.load_almonds_sample_100()
    .specify(response="weight")
    .generate(reps=1000, type="bootstrap", seed=1)
    .calculate(stat=iqr)
)
get_confidence_interval(boot_iqr, level=0.95, type="percentile")
```

The function receives `(response, explanatory)` as numpy arrays (`explanatory` is
`None` for a single-variable `specify`), and must return a scalar.

```{seealso}
Prefer a one-line classical test? See the tidy wrappers in {doc}`theory-based`
(`t_test`, `prop_test`, `chisq_test`).
```