API reference
Below, you find a listing of all public methods of this package. Any other method you might find in the source code is not intended for direct usage.
Common interface
TODO with an exemplary link to fit
.
QUnfold.fit
— Functionfit(m, X, y) -> FittedMethod
Return a copy of the QUnfold method m
that is fitted to the data set (X, y)
.
QUnfold.predict
— Functionpredict(m, X) -> Vector{Float64}
Predict the class prevalences in the data set X
with the fitted method m
.
QUnfold.predict_with_background
— Functionpredict_with_background(m, X, X_b, α=1) -> Vector{Float64}
Predict the class prevalences in the observed data set X
with the fitted method m
, taking into account a background measurement X_b
that is scaled by α
.
Quantification / unfolding methods
CC
QUnfold.CC
— FunctionCC(classifier; kwargs...)
The Classify & Count method, which uses crisp classifier predictions without any adjustment. This weak baseline method is proposed by Forman, 2008: Quantifying counts and costs via classification.
Keyword arguments
fit_classifier = true
whether or not to fit the givenclassifier
.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score
whether to useclassifier.oob_decision_function_
orclassifier.predict_proba(X)
for fittingM
.
ACC
QUnfold.ACC
— FunctionACC(classifier; kwargs...)
The Adjusted Classify & Count method, which solves a least squares objective with crisp classifier predictions.
A regularization strength τ > 0
yields the o-ACC method for ordinal quantification, which is proposed by Bunse et al., 2022: Ordinal Quantification through Regularization.
Keyword arguments
strategy = :softmax
is the solution strategy (see below).τ = 0.0
is the regularization strength for o-ACC.a = Float64[]
are the acceptance factors for unfolding analyses.fit_classifier = true
whether or not to fit the givenclassifier
.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score
whether to useclassifier.oob_decision_function_
orclassifier.predict_proba(X)
for fittingM
.
Strategies
For binary classification, ACC is proposed by Forman, 2008: Quantifying counts and costs via classification. In the multi-class setting, multiple extensions are available.
:softmax
(default; our method) improves:softmax_full_reg
by setting one latent parameter to zero instead of introducing a technical regularization term.:constrained
constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:pinv
computes a pseudo-inverse akin to a minimum-norm constraint, as discussed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:inv
computes the true inverse (if existent) of the transfer matrixM
, as proposed by Vucetic & Obradovic, 2001: Classification on data with biased class distribution.:ovr
solves multiple binary one-versus-rest adjustments, as proposed by Forman (2008).:none
yields theCC
method without any adjustment.:softmax_full_reg
(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg
(our method) is a variant of:softmax
, which sets one latent parameter to zero in addition to introducing a technical regularization term.
PCC
QUnfold.PCC
— FunctionPCC(classifier; kwargs...)
The Probabilistic Classify & Countmethod, which uses predictions of posterior probabilities without any adjustment. This method is proposed by Bella et al., 2010: Quantification via Probability Estimators.
Keyword arguments
fit_classifier = true
whether or not to fit the givenclassifier
.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score
whether to useclassifier.oob_decision_function_
orclassifier.predict_proba(X)
for fittingM
.
PACC
QUnfold.PACC
— FunctionPACC(classifier; kwargs...)
The Probabilistic Adjusted Classify & Count method, which solves a least squares objective with predictions of posterior probabilities.
A regularization strength τ > 0
yields the o-PACC method for ordinal quantification, which is proposed by Bunse et al., 2022: Ordinal Quantification through Regularization.
Keyword arguments
strategy = :softmax
is the solution strategy (see below).τ = 0.0
is the regularization strength for o-PACC.a = Float64[]
are the acceptance factors for unfolding analyses.fit_classifier = true
whether or not to fit the givenclassifier
.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score
whether to useclassifier.oob_decision_function_
orclassifier.predict_proba(X)
for fittingM
.
Strategies
For binary classification, PACC is proposed by Bella et al., 2010: Quantification via Probability Estimators. In the multi-class setting, multiple extensions are available.
:softmax
(default; our method) improves:softmax_full_reg
by setting one latent parameter to zero instead of introducing a technical regularization term.:constrained
constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:pinv
computes a pseudo-inverse akin to a minimum-norm constraint, as discussed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:inv
computes the true inverse (if existent) of the transfer matrixM
, as proposed by Vucetic & Obradovic, 2001: Classification on data with biased class distribution.:ovr
solves multiple binary one-versus-rest adjustments, as proposed by Forman (2008).:none
yields theCC
method without any adjustment.:softmax_full_reg
(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg
(our method) is a variant of:softmax
, which sets one latent parameter to zero in addition to introducing a technical regularization term.
RUN
QUnfold.RUN
— FunctionRUN(transformer; kwargs...)
The Regularized Unfolding method by Blobel, 1985: Unfolding methods in high-energy physics experiments.
Keyword arguments
strategy = :softmax
is the solution strategy (see below).τ = 1e-6
is the regularization strength for ordinal quantification.n_df = -1
(only used ifstrategy==:original
) is the effective number of degrees of freedom, required to be0 < n_df <= C
whereC
is the number of classes.a = Float64[]
are the acceptance factors for unfolding analyses.
Strategies
Blobel's loss function, feature transformation, and regularization can be optimized with multiple strategies.
:softmax
(default; our method) improves:softmax_full_reg
by setting one latent parameter to zero instead of introducing a technical regularization term.:original
is the original, unconstrained Newton optimization proposed by Blobel (1985).:constrained
constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:softmax_full_reg
(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg
(our method) is a variant of:softmax
, which sets one latent parameter to zero in addition to introducing a technical regularization term.:unconstrained
(our method) is similar to:original
, but uses a more generic solver.:positive
(our method) is:constrained
without the sum constraint.
SVD
QUnfold.SVD
— FunctionSVD(transformer; kwargs...)
The The Singular Value Decomposition-based unfolding method by Hoecker & Kartvelishvili, 1996: SVD approach to data unfolding.
Keyword arguments
strategy = :softmax
is the solution strategy (see below).τ = 1e-6
is the regularization strength for ordinal quantification.n_df = -1
(only used ifstrategy==:original
) is the effective rank, required to be0 < n_df < C
whereC
is the number of classes.a = Float64[]
are the acceptance factors for unfolding analyses.
Strategies
Hoecker & Kartvelishvili's loss function, feature transformation, and regularization can be optimized with multiple strategies.
:softmax
(default; our method) improves:softmax_full_reg
by setting one latent parameter to zero instead of introducing a technical regularization term.:original
is the original, analytic solution proposed by Hoecker & Kartvelishvili (1996).:constrained
constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:softmax_full_reg
(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg
(our method) is a variant of:softmax
, which sets one latent parameter to zero in addition to introducing a technical regularization term.:unconstrained
(our method) is similar to:original
, but uses a more generic solver.
HDx
QUnfold.HDx
— TypeHDx(n_bins; kwargs...)
The Hellinger Distance-based method on feature histograms by González-Castro et al., 2013: Class distribution estimation based on the Hellinger distance.
The parameter n_bins
specifies the number of bins per feature. A regularization strength τ > 0
yields the o-HDx method for ordinal quantification, which is proposed by Bunse et al., 2022: Machine learning for acquiring knowledge in astro-particle physics.
Keyword arguments
strategy = :softmax
is the solution strategy (see below).τ = 0.0
is the regularization strength for o-HDx.a = Float64[]
are the acceptance factors for unfolding analyses.
Strategies
González-Castro et al.'s loss function and feature transformation can be optimized with multiple strategies.
:softmax
(default; our method) improves:softmax_full_reg
by setting one latent parameter to zero instead of introducing a technical regularization term.:constrained
constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:softmax_full_reg
(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg
(our method) is a variant of:softmax
, which sets one latent parameter to zero in addition to introducing a technical regularization term.
HDy
QUnfold.HDy
— TypeHDy(classifier, n_bins; kwargs...)
The Hellinger Distance-based method on prediction histograms by González-Castro et al., 2013: Class distribution estimation based on the Hellinger distance.
The parameter n_bins
specifies the number of bins per class. A regularization strength τ > 0
yields the o-HDx method for ordinal quantification, which is proposed by Bunse et al., 2022: Machine learning for acquiring knowledge in astro-particle physics.
Keyword arguments
strategy = :softmax
is the solution strategy (see below).τ = 0.0
is the regularization strength for o-HDx.a = Float64[]
are the acceptance factors for unfolding analyses.fit_classifier = true
whether or not to fit the givenclassifier
.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score
whether to useclassifier.oob_decision_function_
orclassifier.predict_proba(X)
for fittingM
.
Strategies
González-Castro et al.'s loss function and feature transformation can be optimized with multiple strategies.
:softmax
(default; our method) improves:softmax_full_reg
by setting one latent parameter to zero instead of introducing a technical regularization term.:constrained
constrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:softmax_full_reg
(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg
(our method) is a variant of:softmax
, which sets one latent parameter to zero in addition to introducing a technical regularization term.
IBU
QUnfold.IBU
— TypeIBU(transformer, n_bins; kwargs...)
The Iterative Bayesian Unfolding method by D'Agostini, 1995: A multidimensional unfolding method based on Bayes' theorem.
Keyword arguments
o = 0
is the order of the polynomial for ordinal quantification.λ = 0.0
is the impact of the polynomial for ordinal quantification.a = Float64[]
are the acceptance factors for unfolding analyses.
SLD
QUnfold.SLD
— TypeSLD(classifier; kwargs...)
The Saerens-Latinne-Decaestecker method, a.k.a. EMQ or Expectation Maximization-based Quantification by Saerens et al., 2002: Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure.
A polynomial order o > 0
and regularization impact λ > 0
yield the o-SLD method for ordinal quantification, which is proposed by Bunse et al., 2022: Machine learning for acquiring knowledge in astro-particle physics.
Keyword arguments
o = 0
is the order of the polynomial for o-SLD.λ = 0.0
is the impact of the polynomial for o-SLD.a = Float64[]
are the acceptance factors for unfolding analyses.fit_classifier = true
whether or not to fit the givenclassifier
.
Feature transformations
The unfolding methods RUN
, SVD
, and IBU
have the flexibility of choosing between different feature transformations.
QUnfold.ClassTransformer
— TypeClassTransformer(classifier; kwargs...)
This transformer yields the classification-based feature transformation used in ACC
, PACC
, CC
, PCC
, and SLD
.
Keyword arguments
is_probabilistic = false
whether or not to use posterior predictions.fit_classifier = true
whether or not to fit the givenclassifier
.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_score
whether to useclassifier.oob_decision_function_
orclassifier.predict_proba(X)
for fittingM
.
QUnfold.TreeTransformer
— TypeTreeTransformer(tree; kwargs...)
This transformer yields a tree-induced partitioning, as proposed by Börner et al., 2017: Measurement/simulation mismatches and multivariate data discretization in the machine learning era.
Keyword arguments
fit_tree = 1.
whether or not to fit the giventree
. Iffit_tree
isfalse
or0.
, do not fit the tree and use all data for fittingM
. Iffit_tree
istrue
or1.
, fit both the tree andM
with all data. Iffit_tree
is between 0 and 1, use a fraction offit_tree
for fitting the tree and the remaining fraction1-fit_tree
for fittingM
.
QUnfold.HistogramTransformer
— TypeHistogramTransformer(n_bins; kwargs...)
This transformer yields the histogram-based feature transformation used in HDx
and HDy
. The parameter n_bins
specifies the number of bins per input feature.
Keyword arguments
preprocessor = nothing
can be anotherAbstractTransformer
that is called before this transformer.