Sampling

Sampling activities (the “bowl” of balls, tactile samples) are how ModernDive builds intuition for sampling variation. rep_slice_sample takes repeated samples and stacks them with a replicate column — the analog of R moderndive::rep_slice_sample() / infer::rep_sample_n().

The bowl

import moderndive as md
import polars as pl

bowl = md.load_bowl()      # 2400 red/white balls
bowl.head()
shape: (5, 2)
ball_IDcolor
i64str
1"white"
2"white"
3"white"
4"red"
5"white"

One virtual sample

from moderndive import rep_slice_sample

sample = rep_slice_sample(bowl, n=50, seed=1)
# proportion red in this sample
sample.select((pl.col("color") == "red").mean().alias("prop_red"))
shape: (1, 1)
prop_red
f64
0.38

Many samples → a sampling distribution

Take 1000 samples of size 50 and compute the proportion red in each:

samples = rep_slice_sample(bowl, n=50, reps=1000, seed=1)

prop_red = (
    samples
    .group_by("replicate")
    .agg((pl.col("color") == "red").mean().alias("prop_red"))
)
prop_red.head()
shape: (5, 2)
replicateprop_red
i64f64
10.38
20.42
30.42
40.42
50.46

That prop_red column is a sampling distribution. Visualize its spread the same way you would any distribution (see Bootstrapping & confidence intervals for building one with the inference pipeline instead).

With vs. without replacement

rep_slice_sample samples without replacement by default (like dealing from a deck). Pass replace=True for bootstrap-style resampling:

rep_slice_sample(bowl, n=50, reps=1000, replace=True, seed=1)
shape: (50_000, 3)
replicateball_IDcolor
i64i64str
11136"white"
11229"red"
11813"white"
12282"red"
184"white"
1000439"white"
10001203"red"
1000252"white"
10001113"white"
10001277"red"

Tactile samples

The hand-collected counterparts are bundled too:

md.load_tactile_prop_red()   # 33 groups' samples of 50 balls
shape: (33, 4)
groupreplicatered_ballsprop_red
stri64i64f64
"Ilyas, Yohan"1210.42
"Morgan, Terrance"2170.34
"Martin, Thomas"3210.42
"Clark, Frank"4210.42
"Riddhi, Karina"5180.36
"Julie, Hailin"29150.3
"Katie, Caroline"30210.42
"Mallory, Damani, Melissa"31210.42
"Katie"32160.32
"Francis, Vignesh"33190.38

See also

rep_sample_n is an alias of rep_slice_sample (the older infer name). Both are documented in the API reference.