API reference¶
Inference grammar¶
- moderndive.specify(data, *, response=None, explanatory=None, formula=None, success=None)[source]¶
Specify the response (and optional explanatory) variable(s) for inference.
Two equivalent forms, mirroring R
infer:specify(df, response="weight")specify(df, formula="popular_or_not ~ track_genre", success="popular")
Multi-term formulas (
"y ~ a + b") are supported forfit().- Return type:
- Parameters:
- moderndive.observe(data, *, response=None, explanatory=None, formula=None, success=None, stat='mean', order=None, null=None, mu=None, p=None, sigma=None)[source]¶
Shortcut for
specify() |> [hypothesize()] |> calculate().Mirrors R
infer::observe()— compute an observed statistic in one call.
- moderndive.assume(distribution, df=None)[source]¶
Set a theoretical distribution (
"t","z","F","Chisq").dfis the degrees of freedom: a scalar fort/Chisq, a(df1, df2)tuple forF, and unused forz.- Return type:
- Parameters:
- class moderndive.infer.core.Specification(data, response, explanatory=None, success=None, formula=None)[source]¶
Result of
specify(): the chosen response (+ optional explanatory).- Parameters:
- calculate(stat, *, order=None, mu=None, p=None, sigma=None)[source]¶
Compute the observed statistic (no resampling).
- class moderndive.infer.core.Hypothesis(spec, null, mu=None, p=None, sigma=None)[source]¶
- Parameters:
spec (Specification)
null (str)
mu (float | None)
p (float | None)
sigma (float | None)
- class moderndive.infer.core.GeneratedReplicates(spec, type, null, plans, shifted_response=None, hyp_mu=None, hyp_p=None, hyp_sigma=None)[source]¶
Materialized resampling plan; supports both
calculate()andfit().plansholds one numpy array per replicate: - bootstrap: row indices (with replacement) into the data, - permute: a permutation of the row positions (used to shuffle a column), - draw: a simulated response array under a point null proportion.- Parameters:
- class moderndive.infer.core.Distribution(data, stat, null=None, type='bootstrap')[source]¶
A simulated distribution of statistics (one row per replicate).
- class moderndive.infer.core.FitResult(data, null=None, type='bootstrap')[source]¶
Regression coefficients: observed (one row per term) or a distribution of them across replicates (
replicate,term,estimate).
Getters and visualization¶
- moderndive.get_p_value(distribution, obs_stat, direction)[source]¶
Return a one-row frame with the simulation-based
p_value.directionis one ofright/greater,left/less, ortwo-sided. The two-sided p-value uses infer’s convention: twice the smaller one-sided tail proportion, capped at 1.- Return type:
DataFrame- Parameters:
distribution (Distribution)
direction (str)
- moderndive.get_confidence_interval(distribution, level=0.95, type='percentile', *, point_estimate=None)[source]¶
Return a one-row frame with
lower_ciandupper_ci.type="percentile": the(1-level)/2and1-(1-level)/2quantiles of the bootstrap distribution.type="se":point_estimate ± z* · SEwhere SE is the SD of the bootstrap distribution (requirespoint_estimate).
- Return type:
DataFrame- Parameters:
distribution (Distribution)
level (float)
type (str)
point_estimate (float | None)
- moderndive.visualize(distribution, bins=20, *, engine='plotly', method='simulation', shade_pvalue=None, shade_ci=None, **kwargs)[source]¶
Histogram of the simulated statistics, as an
InferPlot.methodis"simulation"(histogram, default),"theoretical"(a normal-approximation density curve), or"both"(histogram in density units overlaid with the normal curve), mirroring Rinfer’svisualize(method=). Passshade_pvalue=/shade_ci=to shade in one call, or compose with+.
- moderndive.shade_p_value(obs_stat, direction, *, color=None)[source]¶
A p-value shading spec; add it to a
visualize()plot with+.direction∈ {right/greater, left/less, two-sided}. For a facetedvisualize_fit()plot, pass a per-termobs_stat— an observedFitResult, aterm-keyed frame, or a dict — to shade each facet.
- moderndive.shade_confidence_interval(endpoints, color=None)[source]¶
A confidence-interval shading spec; add it to a
visualize()plot with+.endpointsis a CI DataFrame (lower_ci/upper_ci) or a(lower, upper)tuple. For a facetedvisualize_fit()plot, pass a per-term CI table (with atermcolumn) to shade each facet from its own interval.- Return type:
ShadeSpec- Parameters:
color (str | None)
Theory-based tests¶
- moderndive.t_test(data, *, formula=None, response=None, explanatory=None, order=None, alternative='two-sided', mu=0.0, conf_level=0.95)[source]¶
One-sample (no explanatory) or two-sample (Welch) t-test, tidy output.
- moderndive.t_stat(data, **kwargs)[source]¶
The t statistic only (see
t_test()).- Return type:
- Parameters:
data (DataFrame)
- moderndive.prop_test(data, *, formula=None, response=None, explanatory=None, success=None, order=None, p=None, alternative='two-sided')[source]¶
One- or two-proportion z-test (normal approximation), tidy output.
- moderndive.chisq_test(data, *, formula=None, response=None, explanatory=None)[source]¶
Chi-squared test of independence (categorical response ~ categorical explanatory).
- moderndive.chisq_stat(data, **kwargs)[source]¶
The chi-squared statistic only (see
chisq_test()).- Return type:
- Parameters:
data (DataFrame)
Theory-based inference wrappers (scipy.stats).
The book deliberately teaches simulation-based inference first, then ties results back to the traditional theory-based methods (t-distribution CIs, the two-sample test, normal approximations). These helpers provide those theory-based companions so the chapters can draw the simulation-vs-theory comparison.
All functions return small polars frames with tidy column names.
- moderndive.theory.t_test_one_sample(x, mu=0.0, alternative='two-sided')[source]¶
One-sample t-test of H0: mean ==
mu.
- moderndive.theory.t_test_two_sample(x, y, alternative='two-sided', equal_var=False)[source]¶
Two-sample (Welch by default) t-test of equal means.
Regression & summary helpers¶
- moderndive.get_regression_table(model, digits=3, conf_level=0.95)[source]¶
Tidy regression table: term, estimate, std_error, statistic, p_value, lower/upper_ci.
modelis a fittedstatsmodelsresults object (e.g. fromstatsmodels.formula.api.ols("y ~ x", data).fit()).
- moderndive.get_regression_points(model, digits=3)[source]¶
Fitted values + residuals per observation (~
broom::augment).Columns:
ID, the response, each explanatory term,<response>_hat,residual.- Return type:
DataFrame- Parameters:
digits (int)
- moderndive.get_regression_summaries(model, digits=3)[source]¶
Model-fit summaries as a tidy 1-row frame (~
moderndive::get_regression_summaries).Columns:
r_squared,adj_r_squared,mse,rmse,sigma,statistic(overall F),p_value,df(model degrees of freedom),nobs.modelis a fittedstatsmodelsresults object.mseis the mean squared residual usingnin the denominator (sormse = sqrt(mse)), whilesigmais the residual standard error usingn - p— matching the R package.- Return type:
DataFrame- Parameters:
digits (int)
- moderndive.get_correlation(data, formula=None, *, x=None, y=None)[source]¶
Pearson correlation as a tidy 1-row frame with a
corcolumn.Mirrors
moderndive::get_correlation(data, y ~ x). Specify the variable pair either as a formula string ("y ~ x") or via thex=andy=keyword arguments. Rows with a null in either column are dropped.
- moderndive.pop_sd(x)[source]¶
Population standard deviation (divides by
n, notn - 1).Mirrors
moderndive::pop_sd. Accepts a polars Series, list, numpy array, or any sequence; nulls/NaNs are dropped before computing.- Return type:
- moderndive.tidy_summary(data, columns=None, digits=3)[source]¶
Per-variable summary statistics for the selected columns.
Mirrors the R
moderndive::tidy_summarycolumn layout:column, n, group, type, min, Q1, mean, median, Q3, max, sd. Numeric columns get the five-number summary + mean/sd; non-numeric columns reportnandtypewith the numeric fields left null.
- moderndive.count_missing(data, columns=None)[source]¶
Count missing (
null) values in each column.A beginner-friendly alternative to
df.select(pl.all().is_null().sum()): it returns a tidy two-column data frame with one row per column (column,n_missing), sorted from most to fewest missing values so the columns needing attention surface first.
Sampling and plots¶
All plotting helpers accept engine="plotly" (default) or engine="plotnine".
- moderndive.rep_slice_sample(data, n, reps=1, replace=False, seed=None)[source]¶
Take
repssamples of sizenfromdata.Returns a polars DataFrame with a leading
replicatecolumn identifying which sample each row belongs to. Setreplace=Truefor sampling with replacement (e.g. bootstrap-style). Passseedfor reproducibility.
- moderndive.rep_sample_n(data, n, reps=1, replace=False, seed=None)[source]¶
Alias for
rep_slice_sample()(older moderndive name).
- moderndive.pairplot(data, columns=None, hue=None, *, engine='plotly')[source]¶
Scatterplot matrix of the numeric columns (the analog of
GGally::ggpairs).engine="plotly"(default) returns a plotlygo.Figurefromplotly.express.scatter_matrix.engine="plotnine"(alias"seaborn") returns the matplotlibFigurefromseaborn.pairplot— the non-plotly backend here is seaborn-backed, since plotnine has no first-class SPLOM.huecolors points by a categorical column.
- moderndive.gg_parallel_slopes(data, response, explanatory, by, *, engine='plotly')[source]¶
Scatterplot with a parallel-slopes regression model overlaid.
Fits
response ~ explanatory + C(by)(one common slope, a separate intercept per level ofby) and draws one fitted line per group over the data.
- moderndive.geom_parallel_slopes(data, response, explanatory, by, color=None)[source]¶
plotnine layer(s) drawing the parallel-slopes fitted lines.
Add to a
ggplotwith+(plotnine-only; for a plotly version callgg_parallel_slopes()withengine="plotly").
- moderndive.gg_categorical_model(data, response, explanatory, *, engine='plotly')[source]¶
Regression with one categorical predictor (~
geom_categorical_model).Fits
response ~ C(explanatory); each category’s fitted value is its group mean, drawn as a horizontal marker over the (jittered) data points.
Datasets¶
- moderndive.load_dataset(name)[source]¶
Load a dataset by name, returning a polars DataFrame.
- Return type:
DataFrame- Parameters:
name (str)
Each dataset also has a convenience loader moderndive.load_<name>() returning
a polars DataFrame. Call :func:moderndive.data.available_datasets for the
full list.