Sampling¶
Sampling activities (the “bowl” of balls, tactile samples) are how ModernDive
builds intuition for sampling variation. rep_slice_sample takes repeated samples
and stacks them with a replicate column — the analog of R
moderndive::rep_slice_sample() / infer::rep_sample_n().
The bowl¶
import moderndive as md
import polars as pl
bowl = md.load_bowl() # 2400 red/white balls
bowl.head()
| ball_ID | color |
|---|---|
| i64 | str |
| 1 | "white" |
| 2 | "white" |
| 3 | "white" |
| 4 | "red" |
| 5 | "white" |
One virtual sample¶
from moderndive import rep_slice_sample
sample = rep_slice_sample(bowl, n=50, seed=1)
# proportion red in this sample
sample.select((pl.col("color") == "red").mean().alias("prop_red"))
| prop_red |
|---|
| f64 |
| 0.38 |
Many samples → a sampling distribution¶
Take 1000 samples of size 50 and compute the proportion red in each:
samples = rep_slice_sample(bowl, n=50, reps=1000, seed=1)
prop_red = (
samples
.group_by("replicate")
.agg((pl.col("color") == "red").mean().alias("prop_red"))
)
prop_red.head()
| replicate | prop_red |
|---|---|
| i64 | f64 |
| 1 | 0.38 |
| 2 | 0.42 |
| 3 | 0.42 |
| 4 | 0.42 |
| 5 | 0.46 |
That prop_red column is a sampling distribution. Visualize its spread the
same way you would any distribution (see Bootstrapping & confidence intervals for building
one with the inference pipeline instead).
With vs. without replacement¶
rep_slice_sample samples without replacement by default (like dealing from a
deck). Pass replace=True for bootstrap-style resampling:
rep_slice_sample(bowl, n=50, reps=1000, replace=True, seed=1)
| replicate | ball_ID | color |
|---|---|---|
| i64 | i64 | str |
| 1 | 1136 | "white" |
| 1 | 1229 | "red" |
| 1 | 1813 | "white" |
| 1 | 2282 | "red" |
| 1 | 84 | "white" |
| … | … | … |
| 1000 | 439 | "white" |
| 1000 | 1203 | "red" |
| 1000 | 252 | "white" |
| 1000 | 1113 | "white" |
| 1000 | 1277 | "red" |
Tactile samples¶
The hand-collected counterparts are bundled too:
md.load_tactile_prop_red() # 33 groups' samples of 50 balls
| group | replicate | red_balls | prop_red |
|---|---|---|---|
| str | i64 | i64 | f64 |
| "Ilyas, Yohan" | 1 | 21 | 0.42 |
| "Morgan, Terrance" | 2 | 17 | 0.34 |
| "Martin, Thomas" | 3 | 21 | 0.42 |
| "Clark, Frank" | 4 | 21 | 0.42 |
| "Riddhi, Karina" | 5 | 18 | 0.36 |
| … | … | … | … |
| "Julie, Hailin" | 29 | 15 | 0.3 |
| "Katie, Caroline" | 30 | 21 | 0.42 |
| "Mallory, Damani, Melissa" | 31 | 21 | 0.42 |
| "Katie" | 32 | 16 | 0.32 |
| "Francis, Vignesh" | 33 | 19 | 0.38 |
See also
rep_sample_n is an alias of rep_slice_sample (the older infer name). Both are
documented in the API reference.