The Python companion package for ModernDive: Statistical Inference via Data
Science — a faithful port of the R moderndive
and infer packages to a modern Python
data-science stack (polars, plotly,
plotnine, statsmodels).
📖 Documentation (with runnable examples): https://moderndive.readthedocs.io
It is intentionally pure-Python (no compiled extensions) so it installs under
Pyodide via micropip for in-browser execution.
pip install moderndive # from PyPI (once published)
# or, from source:
pip install git+https://github.com/moderndive/moderndive-python- A tidy simulation-inference grammar mirroring R
infer:specify → hypothesize → generate → calculate, plusfit()for multiple regression,observe(), andassume()(theoretical t/z/F/Chisq).specify()is also available as a DataFrame method, so you can writedf.specify(...)just like R'sdf %>% specify(...).calculate(stat=...)takes the full infer vocabulary or any custom callable test statistic. Summaries viaget_p_value/get_confidence_interval(percentile, SE, bias-corrected); British-spelling and short aliases included. - Dual-engine plots:
visualize/shade_p_value/shade_confidence_interval(and every plot helper) takeengine="plotly"(default, interactive) orengine="plotnine"— same code, your choice of output. - Theory-based wrapper tests:
t_test,prop_test,chisq_test,t_stat,chisq_stat, plus themoderndive.theorymodule. - Regression & summary helpers mirroring R
moderndive:get_regression_table,get_regression_points,get_regression_summaries,get_correlation,pop_sd,tidy_summary,count_missing(built onstatsmodelswhere relevant, returningpolarsframes), plus the model plotsgg_parallel_slopes/geom_parallel_slopesandgg_categorical_model/geom_categorical_model, andpairplot(theGGally::ggpairsanalog). - Sampling:
rep_slice_sample/rep_sample_nfor sampling-distribution activities. - 58 datasets:
load_*()loaders returningpolarsDataFrames (themoderndive/infer,nycflights23,gapminder, ISLR2, and FiveThirtyEight datasets used in the book).
import moderndive as md
from moderndive import get_p_value, visualize, shade_p_value
spotify = md.load_spotify_metal_deephouse()
# Observed difference in popularity rates (metal − deep house)
obs = (
spotify
.specify(formula="popular_or_not ~ track_genre", success="popular")
.calculate(stat="diff in props", order=("metal", "deep-house"))
)
# Permutation null distribution
null = (
spotify
.specify(formula="popular_or_not ~ track_genre", success="popular")
.hypothesize(null="independence")
.generate(reps=1000, type="permute", seed=76)
.calculate(stat="diff in props", order=("metal", "deep-house"))
)
get_p_value(null, obs_stat=obs, direction="right")
# Visualize — interactive plotly by default; engine="plotnine" for ggplot-style
visualize(null) + shade_p_value(obs_stat=obs, direction="right")This repo uses uv.
uv sync --extra dev # create the environment
make test # run the test suite (enforces 100% coverage)
make build-data # rebuild the bundled Parquet datasets (needs R; see tools/)
make build # build the wheel/sdistThe test suite is held at 100% statement coverage (enforced in CI via
--cov-fail-under=100).
MIT. The ModernDive book content is licensed CC-BY-NC-SA 4.0; this software package is MIT-licensed.
