Regression

The regression helpers turn a fitted statsmodels model into tidy polars tables — the analog of R moderndive’s get_regression_table() / get_regression_points() / get_regression_summaries().

Fit models with statsmodels’ formula API, then tidy them:

import statsmodels.formula.api as smf
import moderndive as md
from moderndive import (
    get_regression_table, get_regression_points, get_regression_summaries, get_correlation,
)

houses = md.load_saratoga_houses()
model = smf.ols("price ~ living_area + bedrooms", data=houses.to_pandas()).fit()

Coefficient table (with confidence intervals)

get_regression_table(model)
shape: (3, 7)
termestimatestd_errorstatisticp_valuelower_ciupper_ci
strf64f64f64f64f64f64
"intercept"20986.0946816.2513.0790.0027611.12834361.06
"living_area"93.8423.10930.1830.087.74199.943
"bedrooms"-7483.0952783.531-2.6880.007-12944.988-2021.203

Change the confidence level with conf_level= (e.g. 0.99).

Fitted values & residuals

get_regression_points(model).head()
# columns: ID, price, living_area, bedrooms, price_hat, residual
shape: (5, 6)
IDpriceliving_areabedroomsprice_hatresidual
i64i64i64i64f64f64
114221219823184531.942-42319.942
213486516763155816.245-20951.245
311800716943157505.404-39498.404
413829718002174935.767-36638.767
512947020883194479.21-65009.21

Model-fit summaries

get_regression_summaries(model)
# r_squared, adj_r_squared, mse, rmse, sigma, statistic (F), p_value, df, nobs
shape: (1, 9)
r_squaredadj_r_squaredmsermsesigmastatisticp_valuedfnobs
f64f64f64f64f64f64f64i64i64
0.5780.5782.5071e950071.18750142.395723.2290.021057

Correlation

get_correlation(houses, "price ~ living_area")   # ≈ 0.759
# or: get_correlation(houses, x="living_area", y="price")
shape: (1, 1)
cor
f64
0.758674

Visualizing models

Two ports of R moderndive’s ggplot helpers, both dual-engine:

from moderndive import gg_parallel_slopes, gg_categorical_model

evals = md.load_evals()

# Parallel-slopes model: one common slope, a separate intercept per group
gg_parallel_slopes(evals, response="score", explanatory="age", by="gender")            # plotly
gg_parallel_slopes(evals, response="score", explanatory="age", by="gender",
                   engine="plotnine")

# Regression with a single categorical predictor
gg_categorical_model(evals, response="score", explanatory="rank")
../_images/e1033ced41fd633d891010cfbeb7e94f3e5963b40c701f09b10335574af609c3.png

For the plotnine engine you can also drop the parallel-slopes lines onto your own ggplot with geom_parallel_slopes().

Inference for regression coefficients

fit() runs the regression on every bootstrap/permutation replicate, giving a distribution per coefficient. Pair it with visualize_fit and per-facet shading:

from moderndive import specify
from moderndive.infer.viz import visualize_fit
from moderndive import shade_confidence_interval, shade_p_value

f = "price ~ living_area + bedrooms"
obs_fit = houses.specify(formula=f).fit()

# Bootstrap distribution of each coefficient → per-term CIs
boot_fit = houses.specify(formula=f).generate(reps=1000, type="bootstrap", seed=1).fit()
boot_fit.get_confidence_interval(level=0.95)          # one row per term
visualize_fit(boot_fit) + shade_confidence_interval(boot_fit.get_confidence_interval())

# Null distribution → per-term p-values, each facet shaded at its own estimate
null_fit = (
    houses.specify(formula=f).hypothesize(null="independence")
    .generate(reps=1000, type="permute", seed=1).fit()
)
null_fit.get_p_value(obs_stat=obs_fit, direction="two-sided")
visualize_fit(null_fit) + shade_p_value(obs_stat=obs_fit, direction="two-sided")
../_images/f6e10c78431076a6963b4233f89f655005a3abddb04acc3c212872af6e2f6c4c.png

Per-facet shading works in both engines: shade_p_value / shade_confidence_interval accept the observed FitResult or a term-keyed table, and each facet is shaded from its own value.