API reference

Below, you find a listing of all public methods of this package. Any other method you might find in the source code is not intended for direct usage.

Common interface

TODO with an exemplary link to fit.

QUnfold.fit — Function

fit(m, X, y) -> FittedMethod

Return a copy of the QUnfold method m that is fitted to the data set (X, y).

source

QUnfold.predict — Function

predict(m, X) -> Vector{Float64}

Predict the class prevalences in the data set X with the fitted method m.

source

QUnfold.predict_with_background — Function

predict_with_background(m, X, X_b, α=1) -> Vector{Float64}

Predict the class prevalences in the observed data set X with the fitted method m, taking into account a background measurement X_b that is scaled by α.

source

Quantification / unfolding methods

CC

QUnfold.CC — Function

CC(classifier; kwargs...)

The Classify & Count method, which uses crisp classifier predictions without any adjustment. This weak baseline method is proposed by Forman, 2008: Quantifying counts and costs via classification.

Keyword arguments

fit_classifier = true whether or not to fit the given classifier.
oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score whether to use classifier.oob_decision_function_ or classifier.predict_proba(X) for fitting M.

source

ACC

QUnfold.ACC — Function

ACC(classifier; kwargs...)

The Adjusted Classify & Count method, which solves a least squares objective with crisp classifier predictions.

A regularization strength τ > 0 yields the o-ACC method for ordinal quantification, which is proposed by Bunse et al., 2022: Ordinal Quantification through Regularization.

Keyword arguments

strategy = :softmax is the solution strategy (see below).
τ = 0.0 is the regularization strength for o-ACC.
a = Float64[] are the acceptance factors for unfolding analyses.
fit_classifier = true whether or not to fit the given classifier.
oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score whether to use classifier.oob_decision_function_ or classifier.predict_proba(X) for fitting M.

Strategies

For binary classification, ACC is proposed by Forman, 2008: Quantifying counts and costs via classification. In the multi-class setting, multiple extensions are available.

:softmax (default; our method) improves :softmax_full_reg by setting one latent parameter to zero instead of introducing a technical regularization term.
:constrained constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.
:pinv computes a pseudo-inverse akin to a minimum-norm constraint, as discussed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.
:inv computes the true inverse (if existent) of the transfer matrix M, as proposed by Vucetic & Obradovic, 2001: Classification on data with biased class distribution.
:ovr solves multiple binary one-versus-rest adjustments, as proposed by Forman (2008).
:none yields the CC method without any adjustment.
:softmax_full_reg (our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.
:softmax_reg (our method) is a variant of :softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.

source

PCC

QUnfold.PCC — Function

PCC(classifier; kwargs...)

The Probabilistic Classify & Countmethod, which uses predictions of posterior probabilities without any adjustment. This method is proposed by Bella et al., 2010: Quantification via Probability Estimators.

Keyword arguments

fit_classifier = true whether or not to fit the given classifier.
oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score whether to use classifier.oob_decision_function_ or classifier.predict_proba(X) for fitting M.

source

PACC

QUnfold.PACC — Function

PACC(classifier; kwargs...)

The Probabilistic Adjusted Classify & Count method, which solves a least squares objective with predictions of posterior probabilities.

A regularization strength τ > 0 yields the o-PACC method for ordinal quantification, which is proposed by Bunse et al., 2022: Ordinal Quantification through Regularization.

Keyword arguments

strategy = :softmax is the solution strategy (see below).
τ = 0.0 is the regularization strength for o-PACC.
a = Float64[] are the acceptance factors for unfolding analyses.
fit_classifier = true whether or not to fit the given classifier.
oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score whether to use classifier.oob_decision_function_ or classifier.predict_proba(X) for fitting M.

Strategies

For binary classification, PACC is proposed by Bella et al., 2010: Quantification via Probability Estimators. In the multi-class setting, multiple extensions are available.

:softmax (default; our method) improves :softmax_full_reg by setting one latent parameter to zero instead of introducing a technical regularization term.
:constrained constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.
:pinv computes a pseudo-inverse akin to a minimum-norm constraint, as discussed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.
:inv computes the true inverse (if existent) of the transfer matrix M, as proposed by Vucetic & Obradovic, 2001: Classification on data with biased class distribution.
:ovr solves multiple binary one-versus-rest adjustments, as proposed by Forman (2008).
:none yields the CC method without any adjustment.
:softmax_full_reg (our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.
:softmax_reg (our method) is a variant of :softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.

source

RUN

QUnfold.RUN — Function

RUN(transformer; kwargs...)

The Regularized Unfolding method by Blobel, 1985: Unfolding methods in high-energy physics experiments.

Keyword arguments

strategy = :softmax is the solution strategy (see below).
τ = 1e-6 is the regularization strength for ordinal quantification.
n_df = -1 (only used if strategy==:original) is the effective number of degrees of freedom, required to be 0 < n_df <= C where C is the number of classes.
a = Float64[] are the acceptance factors for unfolding analyses.

Strategies

Blobel's loss function, feature transformation, and regularization can be optimized with multiple strategies.

:softmax (default; our method) improves :softmax_full_reg by setting one latent parameter to zero instead of introducing a technical regularization term.
:original is the original, unconstrained Newton optimization proposed by Blobel (1985).
:constrained constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.
:softmax_full_reg (our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.
:softmax_reg (our method) is a variant of :softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.
:unconstrained (our method) is similar to :original, but uses a more generic solver.
:positive (our method) is :constrained without the sum constraint.

source

SVD

QUnfold.SVD — Function

SVD(transformer; kwargs...)

The The Singular Value Decomposition-based unfolding method by Hoecker & Kartvelishvili, 1996: SVD approach to data unfolding.

Keyword arguments

strategy = :softmax is the solution strategy (see below).
τ = 1e-6 is the regularization strength for ordinal quantification.
n_df = -1 (only used if strategy==:original) is the effective rank, required to be 0 < n_df < C where C is the number of classes.
a = Float64[] are the acceptance factors for unfolding analyses.

Strategies

Hoecker & Kartvelishvili's loss function, feature transformation, and regularization can be optimized with multiple strategies.

:softmax (default; our method) improves :softmax_full_reg by setting one latent parameter to zero instead of introducing a technical regularization term.
:original is the original, analytic solution proposed by Hoecker & Kartvelishvili (1996).
:constrained constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.
:softmax_full_reg (our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.
:softmax_reg (our method) is a variant of :softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.
:unconstrained (our method) is similar to :original, but uses a more generic solver.

source

HDx

QUnfold.HDx — Type

HDx(n_bins; kwargs...)

The Hellinger Distance-based method on feature histograms by González-Castro et al., 2013: Class distribution estimation based on the Hellinger distance.

The parameter n_bins specifies the number of bins per feature. A regularization strength τ > 0 yields the o-HDx method for ordinal quantification, which is proposed by Bunse et al., 2022: Machine learning for acquiring knowledge in astro-particle physics.

Keyword arguments

strategy = :softmax is the solution strategy (see below).
τ = 0.0 is the regularization strength for o-HDx.
a = Float64[] are the acceptance factors for unfolding analyses.

Strategies

González-Castro et al.'s loss function and feature transformation can be optimized with multiple strategies.

:softmax (default; our method) improves :softmax_full_reg by setting one latent parameter to zero instead of introducing a technical regularization term.
:constrained constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.
:softmax_full_reg (our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.
:softmax_reg (our method) is a variant of :softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.

source

HDy

QUnfold.HDy — Type

HDy(classifier, n_bins; kwargs...)

The Hellinger Distance-based method on prediction histograms by González-Castro et al., 2013: Class distribution estimation based on the Hellinger distance.

The parameter n_bins specifies the number of bins per class. A regularization strength τ > 0 yields the o-HDx method for ordinal quantification, which is proposed by Bunse et al., 2022: Machine learning for acquiring knowledge in astro-particle physics.

Keyword arguments

strategy = :softmax is the solution strategy (see below).
τ = 0.0 is the regularization strength for o-HDx.
a = Float64[] are the acceptance factors for unfolding analyses.
fit_classifier = true whether or not to fit the given classifier.
oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score whether to use classifier.oob_decision_function_ or classifier.predict_proba(X) for fitting M.

Strategies

González-Castro et al.'s loss function and feature transformation can be optimized with multiple strategies.

:softmax (default; our method) improves :softmax_full_reg by setting one latent parameter to zero instead of introducing a technical regularization term.
:constrained constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.
:softmax_full_reg (our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.
:softmax_reg (our method) is a variant of :softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.

source

IBU

QUnfold.IBU — Type

IBU(transformer, n_bins; kwargs...)

The Iterative Bayesian Unfolding method by D'Agostini, 1995: A multidimensional unfolding method based on Bayes' theorem.

Keyword arguments

o = 0 is the order of the polynomial for ordinal quantification.
λ = 0.0 is the impact of the polynomial for ordinal quantification.
a = Float64[] are the acceptance factors for unfolding analyses.

source

SLD

QUnfold.SLD — Type

SLD(classifier; kwargs...)

The Saerens-Latinne-Decaestecker method, a.k.a. EMQ or Expectation Maximization-based Quantification by Saerens et al., 2002: Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure.

A polynomial order o > 0 and regularization impact λ > 0 yield the o-SLD method for ordinal quantification, which is proposed by Bunse et al., 2022: Machine learning for acquiring knowledge in astro-particle physics.

Keyword arguments

o = 0 is the order of the polynomial for o-SLD.
λ = 0.0 is the impact of the polynomial for o-SLD.
a = Float64[] are the acceptance factors for unfolding analyses.
fit_classifier = true whether or not to fit the given classifier.

source

Feature transformations

The unfolding methods RUN, SVD, and IBU have the flexibility of choosing between different feature transformations.

QUnfold.ClassTransformer — Type

ClassTransformer(classifier; kwargs...)

This transformer yields the classification-based feature transformation used in ACC, PACC, CC, PCC, and SLD.

Keyword arguments

is_probabilistic = false whether or not to use posterior predictions.
fit_classifier = true whether or not to fit the given classifier.
oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score whether to use classifier.oob_decision_function_ or classifier.predict_proba(X) for fitting M.

source

QUnfold.TreeTransformer — Type

TreeTransformer(tree; kwargs...)

This transformer yields a tree-induced partitioning, as proposed by Börner et al., 2017: Measurement/simulation mismatches and multivariate data discretization in the machine learning era.

Keyword arguments

fit_tree = 1. whether or not to fit the given tree. If fit_tree is false or 0., do not fit the tree and use all data for fitting M. If fit_tree is true or 1., fit both the tree and M with all data. If fit_tree is between 0 and 1, use a fraction of fit_tree for fitting the tree and the remaining fraction 1-fit_tree for fitting M.

source

QUnfold.HistogramTransformer — Type

HistogramTransformer(n_bins; kwargs...)

This transformer yields the histogram-based feature transformation used in HDx and HDy. The parameter n_bins specifies the number of bins per input feature.

Keyword arguments

preprocessor = nothing can be another AbstractTransformer that is called before this transformer.

source