Coming from R

If you know the R moderndive and infer packages, this page maps the API to Python. The grammar is the same; the main differences are Python method-chaining (.hypothesize() instead of the |>/%>% pipe) and polars DataFrames.

The infer pipeline

# R
pennies %>%
  specify(response = year) %>%
  hypothesize(null = "point", mu = 1995) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean")
# Python — verbs are methods on the returned objects
(
    md.load_pennies()
    .specify(response="year")
    .hypothesize(null="point", mu=1995)
    .generate(reps=1000, type="bootstrap", seed=1)
    .calculate(stat="mean")
)

specify(formula="y ~ x") works just like R’s formula interface; success= marks the success level for categorical responses.

Most names are identical

The overwhelming majority of functions keep the same name (including the British-spelling and short-form aliases). They’re called just as in R — as methods on the pipeline where applicable:

specify, hypothesize/hypothesise, generate, calculate, fit, assume, observe, get_p_value/get_pvalue, get_confidence_interval/get_ci, visualize/visualise, shade_p_value/shade_pvalue, shade_confidence_interval/shade_ci, t_test, prop_test, chisq_test, t_stat, chisq_stat, rep_sample_n/rep_slice_sample, get_regression_table, get_regression_points, get_regression_summaries, get_correlation, pop_sd, tidy_summary, geom_parallel_slopes, geom_categorical_model.

What’s actually different

R

Python

Why

x %>% f(...) / x |> f(...)

x.f(...) (method chaining)

no pipe operator in Python

ggplot2 geom_* layers

plotly by default; engine="plotnine" for ggplot-style

dual-engine plotting

lm(y ~ x, data) object

a fitted statsmodels model: smf.ols("y ~ x", data=df.to_pandas()).fit()

regression backend

get_correlation(df, y ~ x)

get_correlation(df, "y ~ x") or get_correlation(df, x="x", y="y")

formula passed as a string

Other things to know

  • Plots compose with + in both engines, and default to plotly (interactive); pass engine="plotnine" anywhere for ggplot-style output.

  • DataFrames are polars in and out; pass .to_pandas() when a downstream tool needs pandas.

  • Reproducibility: pass seed= to generate() (R uses set.seed()).

Same datasets

Most R moderndive/infer datasets are bundled here under the same name — load_pennies(), load_mythbusters_yawn(), load_gss(), etc. See Datasets.