Nonparametric Methods¶
Core Idea¶
(1) Nonparametric methods are statistical techniques that make minimal assumptions about the specific functional form of the underlying probability distributions, relying instead on ranks, order statistics, resampling, or flexible estimators that can adapt to the data's shape. (2) Where parametric methods assume that data follow a specified distribution family (normal, exponential, Poisson) characterized by a small number of parameters and make inferences conditional on that family being approximately correct, nonparametric methods either make no distributional assumption (distribution-free tests) or only very weak assumptions (e.g., symmetry, continuity, or smoothness)[1]. (3) The trade-off is clear and context-dependent: nonparametric methods are robust to distributional misspecification and outliers but typically sacrifice some statistical power when parametric assumptions do hold; they excel in small-sample settings with uncertain distributions, skewed or heavy-tailed data, ordinal outcomes, and exploratory analysis. (4) The deeper abstraction is that the choice between parametric and nonparametric methods reflects a fundamental trade-off between the efficiency of strong structural assumptions (when correct) and the robustness of weak assumptions (when wrong) — a trade-off that has no universally optimal resolution and must be navigated based on domain knowledge, data characteristics, and the cost of being wrong.
How would you explain it like I'm…
No-Guessing Statistics
Shape-Free Statistics
Nonparametric Methods
Structural Signature¶
- The distribution-free inference property[2]
- The rank-based statistic as data summary[3]
- The asymptotic-relative-efficiency comparison[4]
- The robustness-versus-power trade-off[5]
- The assumption-relaxation in inferential validity[1]
- The resampling-based variance estimation[6]
Nonparametric methods substitute weak or distribution-free assumptions for the strong distributional assumptions of parametric methods. Classical rank-based tests (Wilcoxon, Mann-Whitney, Kruskal-Wallis, Friedman, Spearman) convert observations to ranks and derive exact or asymptotic test statistics from the rank distributions, which are distribution-free under the null hypothesis. Resampling methods (bootstrap, permutation tests, jackknife) use the observed data itself to approximate sampling distributions, bypassing parametric assumptions through computational replication. Kernel methods (kernel density estimation, kernel regression, nearest-neighbor methods) estimate functional forms directly from data without specifying a parametric family. Rank-based regression (quantile regression, M-estimators) produces estimates robust to outliers and distributional irregularities. Nonparametric Bayesian methods (Dirichlet processes, Gaussian processes) handle infinite-dimensional parameter spaces. The distinguishing structural commitment is that flexibility is prioritized over efficiency — the methods can produce valid inferences across a wide range of underlying distributions, at the cost of lower power when parametric assumptions happen to hold.
What It Is Not¶
- Not assumption-free — most nonparametric methods make some assumptions (independence, continuity, symmetry, smoothness), just weaker ones than parametric alternatives.
- Not automatically better than parametric methods — when parametric assumptions hold, parametric methods are typically more powerful, more efficient, and easier to interpret.
- Not the same as robust parametric methods — robust parametric methods (M-estimators, trimmed means) modify parametric estimators to reduce sensitivity to outliers; nonparametric methods abandon the parametric framework entirely or substantially.
- Not limited to classical rank-based tests — the field encompasses density estimation, regression, classification, time series, survival analysis, and Bayesian inference, each with its own nonparametric toolbox.
- Not inherently less sophisticated — modern nonparametric methods (kernel methods, Gaussian processes, Dirichlet process mixtures) rival parametric methods in mathematical depth.
- Not a cure-all for small samples — some nonparametric methods require larger samples than parametric alternatives to achieve comparable precision.
- Not synonymous with "distribution-free" — some nonparametric methods make conditional distributional assumptions (e.g., exchangeability for permutation tests).
- Not always computationally cheaper — resampling-based methods can be computationally expensive, though often embarrassingly parallel.
- Not limited to hypothesis testing — estimation, regression, classification, and prediction all have rich nonparametric traditions.
- Not a safe default — choosing between parametric and nonparametric methods requires judgment; the wrong choice in either direction can mislead.
Broad Use¶
Nonparametric methods are widely used across fields where distributional assumptions are uncertain or known to be violated. In psychology and social sciences, nonparametric tests (Mann-Whitney[4], Wilcoxon signed-rank, Kruskal-Wallis) are standard for ordinal outcomes from Likert scales, small sample sizes, and skewed distributions; Spearman's rank correlation replaces Pearson's when monotonic but nonlinear relationships are of interest. In medicine, survival-analysis methods (Kaplan-Meier estimator, log-rank test, Cox proportional-hazards regression) are fundamentally nonparametric in that they don't specify a functional form for the baseline hazard. In econometrics, quantile regression (Koenker-Bassett 1978) estimates entire conditional distributions rather than conditional means, with applications in wage-distribution analysis, risk modeling, and robust prediction[7]. In machine learning, nearest-neighbor methods, random forests, kernel machines (SVM, kernel ridge regression), and deep-learning-as-function-approximation all embody nonparametric principles. In Bayesian nonparametrics, Dirichlet process mixture models handle clustering with unknown number of clusters, and Gaussian processes provide flexible nonparametric priors over functions with applications in spatial statistics, time series, and Bayesian optimization. In ecology and environmental science, bootstrap-based confidence intervals are standard for estimates where sampling distributions are unknown or complex. In reliability engineering, Kaplan-Meier curves and nonparametric lifetime-distribution estimation handle censored failure-time data. In A/B testing, bootstrap and permutation tests complement parametric t-tests when distributions are heavy-tailed or ratios of metrics are of interest.
Clarity¶
Nonparametric methods clarify the sensitivity of statistical conclusions to distributional assumptions[5]. Running both a parametric and a nonparametric analysis on the same data, then comparing the conclusions, directly diagnoses whether parametric assumptions are driving the findings — a form of sensitivity analysis built into the method choice. When parametric and nonparametric analyses yield similar conclusions, the analyst has strong evidence that distributional assumptions are not the limiting factor; when they diverge, the analyst must think carefully about which set of assumptions is more plausible for the data at hand. This clarity is especially valuable in exploratory data analysis, regulatory submissions where robustness is a primary concern, and cross-study meta-analyses where distributional heterogeneity is expected. The method choice itself becomes a piece of transparency about what the analyst assumed.
Manages Complexity¶
Nonparametric methods manage the complexity of uncertain or heterogeneous distributions. Rather than committing to a specific functional form that may or may not match the data (exponential? Weibull? log-normal?) and defending that choice, the analyst can use methods that don't require the commitment. In survival analysis, the Kaplan-Meier estimator produces a step-function estimate of the survival distribution that accommodates any underlying functional form; Cox regression adds covariate effects without specifying the baseline hazard's shape. In regression, quantile regression estimates the full conditional distribution of the outcome without assuming homoscedasticity or normality. In density estimation, kernel methods produce smooth distribution estimates without assuming a specific parametric family. The complexity management comes at some efficiency cost — nonparametric methods typically have lower power when parametric assumptions hold — but the robustness is often worth it in applied settings where distributional assumptions are legitimately uncertain.
Abstract Reasoning¶
The parametric-nonparametric distinction illuminates a deep principle in statistical reasoning: the bias-variance trade-off extends to model specification itself[1]. Parametric models impose strong structural assumptions that reduce variance (efficient use of data) but introduce bias if the assumptions are wrong. Nonparametric models impose weak structural assumptions that reduce bias (flexibility to fit the true data-generating process) but increase variance (more data needed for the same precision). The optimal choice depends on how much prior information one has about the true data structure and how costly bias and variance are in the downstream application. This bias-variance-structural-assumption trade-off reappears across machine learning (regularization strength), Bayesian statistics (prior informativeness), and statistical modeling generally (polynomial degree, spline knots, model capacity). The nonparametric tradition is an explicit acknowledgment that strong assumptions, when wrong, can be more damaging than weak assumptions, even at some efficiency cost.
Knowledge Transfer¶
| Domain | Parametric Method | Nonparametric Counterpart | When Nonparametric Preferred |
|---|---|---|---|
| Two-group comparison | t-test | Mann-Whitney U / Wilcoxon rank-sum | Skewed data, outliers, small N, ordinal |
| Multi-group comparison | One-way ANOVA | Kruskal-Wallis | Same as above |
| Paired comparison | Paired t-test | Wilcoxon signed-rank | Same as above |
| Correlation | Pearson's r | Spearman's ρ, Kendall's τ | Monotonic but nonlinear relationships |
| Regression (conditional mean) | OLS | Quantile regression, LOESS, random forest | Heteroscedasticity, heavy tails |
| Distribution estimation | Normal/other parametric density | Kernel density estimation | Unknown or multi-modal distribution |
| Survival analysis | Exponential/Weibull model | Kaplan-Meier, Cox regression | Unknown hazard shape |
| Multiple-group ranking | Parametric contrasts | Friedman test, Nemenyi | Ordinal data, matched groups |
| Confidence interval | Normal-theory CI | Bootstrap CI (percentile, BCa) | Unknown sampling distribution |
| Classification | Logistic regression, LDA | k-NN, SVM, random forest | Complex decision boundary |
Examples¶
Formal/abstract¶
Frank Wilcoxon's 1945 paper "Individual Comparisons by Ranking Methods" (Biometrics Bulletin) introduced the signed-rank test for paired samples and the rank-sum test for two independent samples[3]. Henry Mann and Donald Whitney's 1947 paper generalized and deepened the rank-sum test, producing what is now known as the Mann-Whitney U test (mathematically equivalent to Wilcoxon's rank-sum test). The context was the era when parametric methods, particularly Student's t-test and ANOVA, dominated applied statistics, but many biological, agricultural, and psychological datasets were small, skewed, or ordinal — violating the normality assumption central to the parametric framework.
The Wilcoxon-Mann-Whitney insight was simple but powerful: replace the raw observations with their ranks across the combined sample, then test whether the two groups have different rank distributions. Under the null hypothesis of equal distributions, the expected rank sum for each group is determined by its size, and the variance of the rank sum under the null can be computed exactly for small samples or approximated by a normal distribution for large samples. The test is valid under very weak assumptions — essentially just that the observations are continuous (so ties are rare) and that the two samples are independent. Remarkably, the Mann-Whitney test achieves ~95% of the asymptotic efficiency of the t-test even when the normal assumption holds exactly, meaning the efficiency cost of the nonparametric approach is modest even in the scenario most favorable to parametric methods; when normality fails, the Mann-Whitney test often outperforms the t-test. By the 2000s, Mann-Whitney had become a standard citation in psychology and biomedical papers involving small samples or ordinal outcomes.
Mapped back: The Mann-Whitney test exemplifies nonparametric methodology: rank-based transformation eliminates distributional assumptions (test is distribution-free under the null), achieves high relative efficiency even under parametric ideals, and generalizes to ordinal and skewed data where parametric assumptions fail — demonstrating the robustness-versus-power trade-off resolved in the method's favor when distributional shape is uncertain.
Applied/industry¶
A major ride-sharing platform faced a regulatory inquiry about whether its surge-pricing algorithm produced systematically different prices for riders in different neighborhoods with similar underlying trip characteristics[7]. The initial analytics team response used ordinary least-squares regression: ln(surge multiplier) ~ neighborhood + time-of-day + day-of-week + distance + weather + demand signals, looking for neighborhood-coefficient patterns correlating with demographic variables. The regression produced effect estimates suggesting 3-5% price differences across some neighborhood categories. The legal team pushed back: the residual distribution was heavily right-skewed (a long tail of extreme surge multipliers during rare demand events), the sample was highly unbalanced across neighborhoods, and the standard errors — computed under normal-theory assumptions — might not be valid.
A data-science team redid the analysis with nonparametric methods. First, they replaced the OLS regression with quantile regression at the 50th, 75th, 90th, and 95th percentiles of the surge-multiplier distribution. This produced a much richer picture: at the median, neighborhood differences were near-zero and statistically indistinguishable from chance; at the 90th percentile, differences grew to 2-3%; at the 95th percentile (extreme surge events), differences were 6-9% and statistically meaningful. The parametric OLS analysis had been averaging these different regimes together, masking the heterogeneity. Second, they used block bootstrap (resampling neighborhood-day blocks to preserve within-neighborhood temporal correlation) for confidence intervals on the neighborhood effects, which produced wider but more trustworthy intervals than the normal-theory OLS standard errors. Third, they used a permutation test to assess overall statistical significance of the neighborhood effect, which was robust to the skewed distribution.
Mapped back: The ride-sharing fairness audit demonstrates nonparametric methods applied to algorithmic accountability: quantile regression (rank-based effect estimation avoiding parametric assumptions) reveals distributional heterogeneity masked by parametric conditional-mean regression; bootstrap confidence intervals provide robust uncertainty quantification without asymptotic normality; and permutation tests enable valid p-values under distributional misspecification — collectively enabling defensible regulatory reporting on a skewed, heterogeneous outcome.
Structural Tensions¶
T1 — Robustness vs efficiency. Nonparametric methods sacrifice some statistical power when parametric assumptions hold — Mann-Whitney's asymptotic relative efficiency is ~0.955 relative to the t-test under normality. When parametric assumptions are violated, this efficiency ranking can reverse dramatically. The tension is permanent: no method can be simultaneously maximally efficient under one model and maximally robust under misspecification. Best practice often runs both parametric and nonparametric analyses as a sensitivity check.
T2 — Distributional assumption vs interpretability of effect sizes. Parametric methods naturally produce interpretable effect-size estimates (mean differences, regression coefficients, hazard ratios) tied to the assumed distribution. Nonparametric methods often produce p-values or rank-based summaries without equally clean effect-size interpretations. Quantile regression, bootstrap confidence intervals, and stochastic-superiority measures (Pr(X > Y)) help bridge this gap but never fully replicate the interpretive clarity of parametric models when parametric assumptions hold.
T3 — Classical rank-based vs modern resampling-based approaches. Classical nonparametric tests (Wilcoxon, Mann-Whitney, Kruskal-Wallis) are well-understood but sometimes awkwardly matched to specific hypotheses. Resampling methods (bootstrap[8], permutation) offer more flexibility and can be tailored to arbitrary test statistics and confidence intervals for arbitrary functionals, but are computationally more expensive and have subtler theoretical properties (e.g., bootstrap failure for non-smooth functionals, tie handling in permutation tests). The tension is between the maturity and theoretical understanding of classical methods and the flexibility and generality of resampling approaches.
T4 — Simple single-assumption methods vs sophisticated nonparametric models. Classical nonparametric methods often make one or two structural assumptions (independence, exchangeability) but are otherwise quite simple. Modern nonparametric methods (Gaussian processes, Dirichlet process mixtures, neural networks) are sophisticated, require substantial computational and theoretical expertise, and have implicit smoothness or regularity assumptions embedded in their kernels, priors, or architectures[9]. The tension is between the simplicity and interpretability of classical methods and the flexibility and predictive power of modern nonparametric approaches, with the choice depending on problem scale and the analyst's capacity for tuning and diagnostics.
T5 — Asymptotic guarantees vs finite-sample performance. Classical nonparametric tests have well-characterized asymptotic properties (e.g., Mann-Whitney achieves 95.5% relative efficiency under normality asymptotically) but finite-sample performance can be crude, especially with small n or discrete/tied data. Modern nonparametric methods (kernel smoothers, bootstrap) often lack simple analytical forms but can be computationally adapted to specific sample sizes and data structures. The tension is between tests with known asymptotic behavior (simple, well-understood, but potentially crude for real samples) and methods with flexible finite-sample performance (powerful but requiring more careful diagnostics and calibration).
T6 — Distribution-free validity vs adaptive power through structure. Distribution-free methods are valid across all distributions satisfying minimal assumptions (exchangeability, independence), providing robustness; but this universality comes at the cost of power — the methods are not optimized for any particular distribution. If the analyst is willing to assume a specific distributional family (even tentatively), parametric methods can be more powerful. Semi-parametric approaches (Cox regression, double machine learning) attempt to balance both: they relax some distributional assumptions while retaining structure that improves efficiency. The choice involves a judgment about the certainty of distributional knowledge and the relative costs of Type I vs Type II error in the application.
Structural–Framed Character¶
Nonparametric Methods is a hybrid on the structural–framed spectrum, leaning structural with a light frame. At its center is a field-neutral statistical idea: making inferences while assuming minimal structure about the underlying distribution, leaning on ranks, order statistics, resampling, or flexible estimators rather than a specified distribution family. A modest amount of vocabulary comes along from its home in experimental statistics.
The core property — distribution-free inference and the robustness-versus-power trade-off — is a mathematical fact that applies wherever data are analyzed under uncertainty, from clinical trials to econometrics to machine-learning model fitting, without change. It carries little intrinsic normative weight; the trade-off between assumption-relaxation and statistical power is a formal relationship, not a verdict. It can largely be stated formally. The light frame it inherits is the methodological framing of inferential practice: the assumption of an analyst weighing how much distributional structure to commit to, and a vocabulary of validity and efficiency that presumes the goals of statistical inference. The structural content dominates while the frame stays thin, placing it on the structural side of the middle.
Substrate Independence¶
Nonparametric Methods is among the most substrate-tethered entries — composite 1 / 5 on the substrate-independence scale. The general impulse behind it — to make as few distributional assumptions as possible — has broad intuitive appeal, but the prime itself is a family of distribution-free statistical tools: rank-based and resampling techniques like rank-sum and permutation tests. Practitioners think of these as statistical instruments, not as a cross-domain pattern, and both the methods and the vocabulary are inseparable from inference. This is a catalog technique that does not lift cleanly off its statistical medium.
- Composite substrate independence — 1 / 5
- Domain breadth — 1 / 5
- Structural abstraction — 2 / 5
- Transfer evidence — 1 / 5
Relationships to Other Primes¶
Parents (3) — more general patterns this builds on
-
Nonparametric Methods is a kind of Approximation
Nonparametric methods stand in for the true unknown distribution using ranks, order statistics, resampling, or flexible estimators rather than committing to a specified functional family. That is the canonical move of approximation: substituting a tractable surrogate for an intractable target while carrying explicit guarantees about the error. Nonparametric methods specialize approximation to the case where the surrogate avoids strong distributional assumptions, trading parametric efficiency for robustness to misspecification.
-
Nonparametric Methods is a kind of Statistical Inference
Nonparametric methods are the distribution-free or distribution-light specialization of statistical inference: they draw conclusions about populations from samples without committing to a specific parametric distribution family, relying instead on ranks, order statistics, resampling, or flexible estimators. Where statistical inference names the broad reasoning from sample to population under explicit uncertainty accounting generally, the nonparametric specialization fixes the assumption profile as minimal, trading some efficiency under correctly-specified models for robustness across distributional shapes.
-
Nonparametric Methods presupposes Distributional Assumption
Nonparametric methods are defined by contrast with parametric approaches: they minimize or avoid the distributional assumption rather than abolish the framing of distributional choice. Without the distributional-assumption machinery — the recognition that modeling uncertain quantities requires a commitment about probability-distribution shape — there would be no design dimension along which nonparametric methods could be located as the minimal-commitment pole. The parent prime supplies the structural choice (what to assume about shape) that nonparametric methods occupy a particular position on.
Path to root: Nonparametric Methods → Statistical Inference → Probability
Neighborhood in Abstraction Space¶
Nonparametric Methods sits in a moderately populated region (44th percentile for distinctiveness): it has near-neighbors but no dense thicket of synonyms.
Family — Statistical Inference & Modeling (11 primes)
Nearest neighbors
- Statistical Inference — 0.82
- Variability — 0.81
- Sampling (Representativeness) — 0.79
- Distributional Assumption — 0.79
- Bayesian Updating — 0.79
Computed from structural-signature embeddings · 2026-05-29
Not to Be Confused With¶
Comparative Method is a qualitative research design for identifying what varies and why across cases, whereas Nonparametric Methods are quantitative statistical techniques that make minimal distributional assumptions. Comparative Method examines case-by-case variation to construct theories or identify patterns (e.g., comparing political systems to understand how institutions produce different outcomes, or comparing firms to understand why some adopt technology faster); it relies on matched pairs, controlled comparisons, and qualitative reasoning, not on probability models. Nonparametric Methods, by contrast, work with quantitative data and use statistical inference (ranks, resampling, distribution-free tests) to draw conclusions about population parameters or test hypotheses. A political scientist using Comparative Method might examine five democracies and five autocracies to understand institutional effects; the same scientist using Nonparametric Methods would gather quantitative data (GDP growth, literacy, institutional scores) on many cases and use Mann-Whitney tests or quantile regression to estimate population effects. The superficial similarity—both methods avoid forcing structure onto complex situations—masks the fundamental difference: Comparative Method is design-based and produces case-level causal narratives; Nonparametric Methods are inference-based and produce population-level statistical conclusions. Neither is "better"; they address different epistemic questions. Comparative Method asks "why did these cases differ?"; Nonparametric Methods ask "what is the population average effect and how certain are we?"
Statistical Inference is the broader conceptual framework for drawing conclusions about populations from samples using probability models—it encompasses both parametric and nonparametric approaches. Parametric inference assumes a specific family of distributions (normal, exponential, etc.) and estimates the parameters characterizing that family; nonparametric inference relaxes the distributional assumption and instead works with ranks, order statistics, or resampling from the observed data. The relationship is hierarchical: Statistical Inference is the genus; Parametric Methods and Nonparametric Methods are two species within it. All nonparametric methods are species of statistical inference, but not all statistical inference is nonparametric—a t-test conducted under normality assumptions is statistical inference that is parametric. The practical distinction turns on what you are willing to assume: a parametric approach says "I am confident the data follow a normal distribution; let me estimate the mean and variance"; a nonparametric approach says "I am not confident about the functional form; let me test whether the medians differ using ranks instead." Both use statistical inference logic (sample → population, probability models, hypothesis tests, confidence intervals), but nonparametric approaches use weaker probability-model structure to achieve robustness to misspecification. A researcher choosing between parametric and nonparametric analysis is making a choice within the Statistical Inference framework about how much structure to impose.
Uniformitarianism is a methodological assumption in historical and paleontological reasoning that processes visible today (erosion, sedimentation, gravity) operated identically in the past, allowing past conditions to be inferred from present observations. Nonparametric Methods are statistical inference techniques and share no conceptual ground with Uniformitarianism. The surface-level confusion might arise if a researcher uses nonparametric methods to analyze paleoclimate data (e.g., bootstrap confidence intervals on inferred temperature from proxy data) and conflates the statistical method with the Uniformitarian assumption—but they are distinct. A paleoclimatologist invoking Uniformitarianism is making a claim about process constancy across geological time; a paleoclimatologist choosing nonparametric methods is making a claim about statistical robustness to distributional assumptions in the modern analysis of proxy data. The two positions are compatible but address different issues: Uniformitarianism concerns the inference from present to past; Nonparametric Methods concern inference from sample to population in a single time slice. A non-Uniformitarian paleontologist (believing biological processes have changed fundamentally) can still use nonparametric statistical methods; conversely, a Uniformitarian paleontologist using parametric methods is not contradicting the Uniformitarian assumption.
Solution Archetypes¶
Solution archetypes in the catalog that build on this prime — directly (this prime is a source ingredient) or as a related prime.
Built directly on this prime (1)
Also a related prime in 1 archetype
Notes¶
The nonparametric tradition developed through two largely independent streams: classical rank-based methods (Wilcoxon 1945; Mann-Whitney 1947; Kendall 1938; Spearman 1904) focused on distribution-free hypothesis testing, and modern resampling and functional methods (Efron's bootstrap 1979; Fisher's permutation tests 1935; kernel density estimation Rosenblatt 1956, Parzen 1962) focused on estimation and computation. Core references: Hollander, Wolfe & Chicken Nonparametric Statistical Methods (3rd ed. 2014); Efron & Tibshirani An Introduction to the Bootstrap (1993); Koenker Quantile Regression (2005); Wasserman All of Nonparametric Statistics (2006); Hjort et al. Bayesian Nonparametrics (2010). Modern machine learning substantially reopened the nonparametric conversation: random forests, kernel machines, and neural networks all embody nonparametric principles under the surface. The kernel-trick-to-Gaussian-processes-to-neural-networks progression can be read as a sequence of increasingly flexible nonparametric function approximators, each with its own bias-variance trade-off. Contemporary practice often blurs the parametric-nonparametric distinction — semiparametric methods (Cox regression, propensity-score methods, double machine learning) combine parametric modeling of components of interest with nonparametric handling of nuisance components.
References¶
[1] Wasserman, L. (2006). All of Nonparametric Statistics. Springer. Definitive overview of nonparametric inference: develops density estimation, kernel methods, bootstrap, and rank-based tests that minimize parametric distributional commitments while imposing alternative structural assumptions (smoothness, exchangeability). ↩
[2] Conover, W. J. (1999). Practical Nonparametric Statistics (3rd ed.). Wiley. Conover standard reference classical nonparametric methods tests. ↩
[3] Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83. Wilcoxon rank-sum signed-rank tests distribution-free hypothesis testing. ↩
[4] Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18(1), 50–60. Mann-Whitney U test equivalence Wilcoxon rank-sum asymptotic efficiency. ↩
[5] Hollander, M., Wolfe, D. A., & Chicken, E. (2014). Nonparametric Statistical Methods (3rd ed.). Wiley. Hollander-Wolfe-Chicken modern nonparametric methods reference. ↩
[6] Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7(1), 1–26. Efron bootstrap computational inference method as nonparametric alternative to parametric Bayesian posteriors. ↩
[7] Koenker, R. (2005). Quantile Regression. Cambridge University Press. Koenker quantile regression robust conditional-distribution estimation. ↩
[8] Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall. Efron Tibshirani bootstrap confidence intervals resampling methods percentile BCa. ↩
[9] Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer. Vapnik kernel methods SVM nonparametric flexibility learning theory. ↩
[10] Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583–621. Kruskal-Wallis test multi-group nonparametric comparison.
[11] Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15(1), 72–101. Foundational paper of classical reliability theory: introduces the decomposition of an observed score into a true component and an independent error component, distinguishing apparatus/process noise from genuine variability across subjects.
[12] Kolmogorov, A. N. (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Ergebnisse der Mathematik und ihrer Grenzgebiete 2, no. 3. Berlin: Springer-Verlag. English translation: Foundations of the Theory of Probability, trans. Nathan Morrison (New York: Chelsea, 1950). Founding measure-theoretic axiomatization of probability — sample space, σ-algebra of events, countably-additive probability measure, ratio definition of conditional probability — that becomes the modern mathematical substrate for the field.
[13] Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. Annals of Mathematical Statistics, 34(2), 598–611. Hodges-Lehmann robust location estimator rank-based.
[14] Chernozhukov, V., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometric Journal, 21(1), C1–C68. Chernozhukov double machine learning semiparametric nuisance parameters.
[15] Fisher, R. A. (1935). The Design of Experiments. Oliver and Boyd, Edinburgh. (Foundational treatise on experimental design; establishes randomization as the "reasoned basis for inference" and develops the principles of randomization, replication, and blocking that underpin modern randomization-based causal inference.)