API reference¶
Inference grammar¶
- moderndive.specify(data, *, response=None, explanatory=None, formula=None, success=None)[source]¶
Specify the response (and optional explanatory) variable(s) for inference.
Two equivalent forms, mirroring R
infer:specify(df, response="weight")specify(df, formula="popular_or_not ~ track_genre", success="popular")
Multi-term formulas (
"y ~ a + b") are supported forfit().- Return type:
- Parameters:
- moderndive.observe(data, *, response=None, explanatory=None, formula=None, success=None, stat='mean', order=None, null=None, mu=None, med=None, p=None, sigma=None)[source]¶
Shortcut for
specify() |> [hypothesize()] |> calculate().Mirrors R
infer::observe()— compute an observed statistic in one call.
- moderndive.assume(distribution, df=None)[source]¶
Set a theoretical distribution (
"t","z","F","Chisq").dfis the degrees of freedom: a scalar fort/Chisq, a(df1, df2)tuple forF, and unused forz.- Return type:
- Parameters:
- class moderndive.infer.core.Specification(data, response, explanatory=None, success=None, formula=None)[source]¶
Result of
specify(): the chosen response (+ optional explanatory).- Parameters:
- calculate(stat, *, order=None, mu=None, p=None, sigma=None)[source]¶
Compute the observed statistic (no resampling).
- class moderndive.infer.core.Hypothesis(spec, null, mu=None, med=None, p=None, sigma=None)[source]¶
- Parameters:
- class moderndive.infer.core.GeneratedReplicates(spec, type, null, plans, shifted_response=None, hyp_mu=None, hyp_p=None, hyp_sigma=None, variables=None)[source]¶
Materialized resampling plan; supports both
calculate()andfit().plansholds one numpy array per replicate: - bootstrap: row indices (with replacement) into the data, - permute: a permutation of the row positions (used to shuffle a column), - draw: a simulated response array under a point null proportion.- Parameters:
- class moderndive.infer.core.Distribution(data, stat, null=None, type='bootstrap')[source]¶
A simulated distribution of statistics (one row per replicate).
- class moderndive.infer.core.FitResult(data, null=None, type='bootstrap')[source]¶
Regression coefficients: observed (one row per term) or a distribution of them across replicates (
replicate,term,estimate).
Getters and visualization¶
- moderndive.get_p_value(distribution, obs_stat, direction)[source]¶
Return a one-row frame with the simulation-based
p_value.directionis one ofright/greater,left/less, ortwo-sided. The two-sided p-value uses infer’s convention: twice the smaller one-sided tail proportion, capped at 1.- Return type:
DataFrame- Parameters:
distribution (Distribution)
direction (str)
- moderndive.get_confidence_interval(distribution, level=0.95, type='percentile', *, point_estimate=None)[source]¶
Return a one-row frame with
lower_ciandupper_ci.type="percentile": the(1-level)/2and1-(1-level)/2quantiles of the bootstrap distribution.type="se":point_estimate ± z* · SEwhere SE is the SD of the bootstrap distribution (requirespoint_estimate).
- Return type:
DataFrame- Parameters:
distribution (Distribution)
level (float)
type (str)
point_estimate (float | None)
- moderndive.visualize(distribution, bins=20, *, engine='plotly', method='simulation', dens_color=None, shade_pvalue=None, shade_ci=None, **kwargs)[source]¶
Histogram of the simulated statistics, as an
InferPlot.methodis"simulation"(histogram, default),"theoretical"(a normal-approximation density curve), or"both"(histogram in density units overlaid with the normal curve), mirroring Rinfer’svisualize(method=).dens_colorsets the theoretical-curve color (for"theoretical"/"both"). Passshade_pvalue=/shade_ci=to shade in one call, or compose with+.
- moderndive.shade_p_value(obs_stat, direction, *, color=None, fill=None)[source]¶
A p-value shading spec; add it to a
visualize()plot with+.direction∈ {right/greater, left/less, two-sided}. For a facetedvisualize_fit()plot, pass a per-termobs_stat— an observedFitResult, aterm-keyed frame, or a dict — to shade each facet.
- moderndive.shade_confidence_interval(endpoints, color=None, fill=None)[source]¶
A confidence-interval shading spec; add it to a
visualize()plot with+.endpointsis a CI DataFrame (lower_ci/upper_ci) or a(lower, upper)tuple. For a facetedvisualize_fit()plot, pass a per-term CI table (with atermcolumn) to shade each facet from its own interval.
Theory-based tests¶
- moderndive.t_test(data, *, formula=None, response=None, explanatory=None, order=None, alternative='two-sided', mu=0.0, conf_level=0.95)[source]¶
One-sample (no explanatory) or two-sample (Welch) t-test, tidy output.
- moderndive.t_stat(data, **kwargs)[source]¶
The t statistic only (see
t_test()).- Return type:
- Parameters:
data (DataFrame)
- moderndive.prop_test(data, *, formula=None, response=None, explanatory=None, success=None, order=None, p=None, alternative='two-sided', z=False, correct=True, conf_int=True, conf_level=0.95)[source]¶
Tidy one- or two-proportion test, mirroring R
infer::prop_test.By default reports the chi-square statistic (like R’s
prop.test) with achisq_dfcolumn; passz=Truefor the signed z statistic instead.correctapplies Yates’ continuity correction. Withconf_int=True(default) the output includes aconf_levelconfidence interval — on the proportion (one-sample) or on the difference in proportions (two-sample).
- moderndive.chisq_test(data, *, formula=None, response=None, explanatory=None, p=None)[source]¶
Tidy chi-squared test.
With an explanatory variable, this is a test of independence. With only a response and a
p={level: probability, ...}mapping, it is a goodness-of-fit test against those hypothesized proportions. Returnsstatistic,chisq_df,p_value.
- moderndive.chisq_stat(data, **kwargs)[source]¶
The chi-squared statistic only (see
chisq_test()).- Return type:
- Parameters:
data (DataFrame)
Theory-based inference wrappers (scipy.stats).
The book deliberately teaches simulation-based inference first, then ties results back to the traditional theory-based methods (t-distribution CIs, the two-sample test, normal approximations). These helpers provide those theory-based companions so the chapters can draw the simulation-vs-theory comparison.
All functions return small polars frames with tidy column names.
- moderndive.theory.t_test_one_sample(x, mu=0.0, alternative='two-sided')[source]¶
One-sample t-test of H0: mean ==
mu.
- moderndive.theory.t_test_two_sample(x, y, alternative='two-sided', equal_var=False)[source]¶
Two-sample (Welch by default) t-test of equal means.
Regression & summary helpers¶
- moderndive.get_regression_table(model, digits=3, conf_level=0.95, exponentiate=False, default_categorical_levels=False)[source]¶
Tidy regression table: term, estimate, std_error, statistic, p_value, lower/upper_ci.
modelis a fittedstatsmodelsresults object — either OLS (smf.ols("y ~ x", data).fit()) or GLM (smf.glm(...).fit()).For GLMs with a log or logit link, pass
exponentiate=Trueto report the coefficient estimate and its confidence interval as rate / odds ratios (std_error,statistic, andp_valuestay on the model’s link scale, matchingbroom::tidy).By default, categorical-predictor terms are prettified (e.g.
income[T.High income]→income: High income). Passdefault_categorical_levels=Trueto keep the raw statsmodels term names.
- moderndive.get_regression_points(model, digits=3, *, newdata=None, ID=None)[source]¶
Fitted values + residuals per observation (~
broom::augment).Columns:
ID, the outcome, each original predictor,<outcome>_hat,residual. In-formula transformations are handled gracefully: a transformed outcome (np.log(mpg)) is shown on the model’s scale under a sanitized name (log_mpg/log_mpg_hat), and transformed predictors (poly(),scale(),I()) are shown as their original columns rather than leaking basis matrices. For GLMs, fitted values and residuals are on the response scale (e.g. probabilities for logistic regression).Pass
newdata(a polars/pandas frame) to apply the model to new observations: predictions are returned, plus aresidualif the outcome is present innewdata.IDnames a column to use as the identifier (placed first); without it,IDis1..n.
- moderndive.get_regression_summaries(model, digits=3)[source]¶
Model-fit summaries as a tidy 1-row frame (~
broom::glance).For an OLS model:
r_squared,adj_r_squared,mse,rmse,sigma,statistic(overall F),p_value,df,nobs.For a GLM (no R² applies):
mse,rmse,deviance,null_deviance,aic,bic,log_lik,df_residual,df_null,nobs.mse/rmseuse response-scale residuals.mseis the mean squared residual usingnin the denominator (sormse = sqrt(mse)); for OLSsigmais the residual standard error usingn - p— matching the R package.- Return type:
DataFrame- Parameters:
digits (int)
- moderndive.get_correlation(data, formula=None, *, x=None, y=None, method='pearson', na_rm=True, wide=False, quiet=False)[source]¶
Correlation between an outcome and one or more predictors.
Mirrors
moderndive::get_correlation. Give the variables either as a formula ("y ~ x"or"y ~ x1 + x2 + x3") or, for a single predictor, viax=andy=.methodis"pearson"(default),"spearman"(rank correlation), or"kendall"(rank concordance).na_rmdrops rows with a null in either column before computing (per predictor pair); setna_rm=Falseto keep them (yieldingnanif any are present).With one predictor the result is a 1-row frame with a
corcolumn. With multiple predictors the result is long by default — columnspredictorandcor(one row each) — or passwide=Truefor one column per predictor.A short note points to a full pairwise correlation matrix when there are multiple predictors; silence it with
quiet=True.
- moderndive.pop_sd(x)[source]¶
Population standard deviation (divides by
n, notn - 1).Mirrors
moderndive::pop_sd. Accepts a polars Series, list, numpy array, or any sequence; nulls/NaNs are dropped before computing.- Return type:
- moderndive.tidy_summary(data, columns=None, digits=3)[source]¶
Per-variable summary statistics for the selected columns.
Mirrors the R
moderndive::tidy_summarycolumn layout:column, n, group, type, min, Q1, mean, median, Q3, max, sd. Numeric columns get the five-number summary + mean/sd; non-numeric columns reportnandtypewith the numeric fields left null.
- moderndive.count_missing(data, columns=None)[source]¶
Count missing (
null) values in each column.A beginner-friendly alternative to
df.select(pl.all().is_null().sum()): it returns a tidy two-column data frame with one row per column (column,n_missing), sorted from most to fewest missing values so the columns needing attention surface first.
Sampling and plots¶
All plotting helpers accept engine="plotly" (default) or engine="plotnine".
- moderndive.rep_slice_sample(data, n=None, *, prop=None, reps=1, replace=False, weight_by=None, seed=None)[source]¶
Take
repssamples fromdata.Give the sample size as either
n(a count) orprop(a fraction of the rows, e.g.prop=0.5). Returns a polars DataFrame with a leadingreplicatecolumn identifying which sample each row belongs to. Setreplace=Truefor sampling with replacement (bootstrap-style).weight_bygives unequal selection probabilities — a column name or a sequence of weights. Passseedfor reproducibility.
- moderndive.rep_sample_n(data, n, *, reps=1, replace=False, prob=None, seed=None)[source]¶
Take
repssamples of sizen(older moderndive name).Like
rep_slice_sample(), but the sample size is always the countnand unequal selection weights are passed asprob(a column name or a sequence), matching the Rrep_sample_nsignature.
- moderndive.pairplot(data, columns=None, hue=None, *, engine='plotly')[source]¶
Scatterplot matrix of the numeric columns (the analog of
GGally::ggpairs).engine="plotly"(default) returns a plotlygo.Figurefromplotly.express.scatter_matrix.engine="plotnine"(alias"seaborn") returns the matplotlibFigurefromseaborn.pairplot— the non-plotly backend here is seaborn-backed, since plotnine has no first-class SPLOM.huecolors points by a categorical column.
- moderndive.gg_parallel_slopes(data, response, explanatory, by, *, alpha=1.0, engine='plotly')[source]¶
Scatterplot with a parallel-slopes regression model overlaid.
Fits
response ~ explanatory + C(by)(one common slope, a separate intercept per level ofby) and draws one fitted line per group over the data.alphasets the point transparency (0–1), useful when points overlap.
- moderndive.geom_parallel_slopes(data, response, explanatory, by, color=None)[source]¶
plotnine layer(s) drawing the parallel-slopes fitted lines.
Add to a
ggplotwith+(plotnine-only; for a plotly version callgg_parallel_slopes()withengine="plotly").
- moderndive.gg_categorical_model(data, response, explanatory, *, engine='plotly')[source]¶
Regression with one categorical predictor (~
geom_categorical_model).Fits
response ~ C(explanatory); each category’s fitted value is its group mean, drawn as a horizontal marker over the (jittered) data points.
- moderndive.geom_categorical_model(data, response, explanatory, *, engine='plotly')¶
Regression with one categorical predictor (~
geom_categorical_model).Fits
response ~ C(explanatory); each category’s fitted value is its group mean, drawn as a horizontal marker over the (jittered) data points.
- moderndive.plot_3d_regression(data, formula, n=25)[source]¶
Interactive 3D scatterplot with a fitted regression plane.
Mirrors
moderndive::plot_3d_regression. Pass a formulaz ~ x + y— one numeric outcome and exactly two numeric predictors — and get a plotlygo.Figurewith the data points and the fittedlmplane.In-formula transformations (e.g.
log(z) ~ x + y) are not supported, since the plane and the raw points would be on different scales; transform the columns ofdatafirst and pass plain names.nsets the plane’s grid resolution per axis.
Viewing data¶
- moderndive.View(x, title=None)[source]¶
Display a data frame as an interactive table (search, sort, paginate).
In a notebook / Quarto context this renders an interactive table via the optional
itablespackage (install withpip install "moderndive[view]"). Withoutitablesit returns the data frame so it still displays. Accepts a polars or pandas DataFrame (or anything coercible to one).titleis shown as the table caption.- Parameters:
title (str | None)
Datasets¶
- moderndive.load_dataset(name)[source]¶
Load a dataset by name, returning a polars DataFrame.
- Return type:
DataFrame- Parameters:
name (str)
Each dataset also has a convenience loader moderndive.load_<name>() returning
a polars DataFrame. Call :func:moderndive.data.available_datasets for the
full list.