Coming from R¶
If you know the R moderndive and infer packages, this page maps the API to
Python. The grammar is the same; the main differences are Python method-chaining
(.hypothesize() instead of the |>/%>% pipe) and polars DataFrames.
The infer pipeline¶
# R
pennies %>%
specify(response = year) %>%
hypothesize(null = "point", mu = 1995) %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "mean")
# Python — verbs are methods on the returned objects
(
md.load_pennies()
.specify(response="year")
.hypothesize(null="point", mu=1995)
.generate(reps=1000, type="bootstrap", seed=1)
.calculate(stat="mean")
)
specify(formula="y ~ x") works just like R’s formula interface; success= marks
the success level for categorical responses.
Most names are identical¶
The overwhelming majority of functions keep the same name (including the British-spelling and short-form aliases). They’re called just as in R — as methods on the pipeline where applicable:
specify,hypothesize/hypothesise,generate,calculate,fit,assume,observe,get_p_value/get_pvalue,get_confidence_interval/get_ci,visualize/visualise,shade_p_value/shade_pvalue,shade_confidence_interval/shade_ci,t_test,prop_test,chisq_test,t_stat,chisq_stat,rep_sample_n/rep_slice_sample,get_regression_table,get_regression_points,get_regression_summaries,get_correlation,pop_sd,tidy_summary,geom_parallel_slopes,geom_categorical_model.
What’s actually different¶
R |
Python |
Why |
|---|---|---|
|
|
no pipe operator in Python |
ggplot2 |
plotly by default; |
dual-engine plotting |
|
a fitted statsmodels model: |
regression backend |
|
|
formula passed as a string |
Other things to know¶
Plots compose with
+in both engines, and default to plotly (interactive); passengine="plotnine"anywhere for ggplot-style output.DataFrames are polars in and out; pass
.to_pandas()when a downstream tool needs pandas.Reproducibility: pass
seed=togenerate()(R usesset.seed()).
Same datasets¶
Most R moderndive/infer datasets are bundled here under the same name —
load_pennies(), load_mythbusters_yawn(), load_gss(), etc. See Datasets.