API reference
Below, you find a listing of all public methods of this package. Any other method you might find in the source code is not intended for direct usage.
Common interface
TODO with an exemplary link to fit.
QUnfold.fit — Functionfit(m, X, y) -> FittedMethodReturn a copy of the QUnfold method m that is fitted to the data set (X, y).
QUnfold.predict — Functionpredict(m, X) -> Vector{Float64}Predict the class prevalences in the data set X with the fitted method m.
QUnfold.predict_with_background — Functionpredict_with_background(m, X, X_b, α=1) -> Vector{Float64}Predict the class prevalences in the observed data set X with the fitted method m, taking into account a background measurement X_b that is scaled by α.
Quantification / unfolding methods
CC
QUnfold.CC — FunctionCC(classifier; kwargs...)The Classify & Count method, which uses crisp classifier predictions without any adjustment. This weak baseline method is proposed by Forman, 2008: Quantifying counts and costs via classification.
Keyword arguments
fit_classifier = truewhether or not to fit the givenclassifier.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_scorewhether to useclassifier.oob_decision_function_orclassifier.predict_proba(X)for fittingM.
ACC
QUnfold.ACC — FunctionACC(classifier; kwargs...)The Adjusted Classify & Count method, which solves a least squares objective with crisp classifier predictions.
A regularization strength τ > 0 yields the o-ACC method for ordinal quantification, which is proposed by Bunse et al., 2022: Ordinal Quantification through Regularization.
Keyword arguments
strategy = :softmaxis the solution strategy (see below).τ = 0.0is the regularization strength for o-ACC.a = Float64[]are the acceptance factors for unfolding analyses.fit_classifier = truewhether or not to fit the givenclassifier.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_scorewhether to useclassifier.oob_decision_function_orclassifier.predict_proba(X)for fittingM.
Strategies
For binary classification, ACC is proposed by Forman, 2008: Quantifying counts and costs via classification. In the multi-class setting, multiple extensions are available.
:softmax(default; our method) improves:softmax_full_regby setting one latent parameter to zero instead of introducing a technical regularization term.:constrainedconstrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:pinvcomputes a pseudo-inverse akin to a minimum-norm constraint, as discussed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:invcomputes the true inverse (if existent) of the transfer matrixM, as proposed by Vucetic & Obradovic, 2001: Classification on data with biased class distribution.:ovrsolves multiple binary one-versus-rest adjustments, as proposed by Forman (2008).:noneyields theCCmethod without any adjustment.:softmax_full_reg(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg(our method) is a variant of:softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.
PCC
QUnfold.PCC — FunctionPCC(classifier; kwargs...)The Probabilistic Classify & Countmethod, which uses predictions of posterior probabilities without any adjustment. This method is proposed by Bella et al., 2010: Quantification via Probability Estimators.
Keyword arguments
fit_classifier = truewhether or not to fit the givenclassifier.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_scorewhether to useclassifier.oob_decision_function_orclassifier.predict_proba(X)for fittingM.
PACC
QUnfold.PACC — FunctionPACC(classifier; kwargs...)The Probabilistic Adjusted Classify & Count method, which solves a least squares objective with predictions of posterior probabilities.
A regularization strength τ > 0 yields the o-PACC method for ordinal quantification, which is proposed by Bunse et al., 2022: Ordinal Quantification through Regularization.
Keyword arguments
strategy = :softmaxis the solution strategy (see below).τ = 0.0is the regularization strength for o-PACC.a = Float64[]are the acceptance factors for unfolding analyses.fit_classifier = truewhether or not to fit the givenclassifier.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_scorewhether to useclassifier.oob_decision_function_orclassifier.predict_proba(X)for fittingM.
Strategies
For binary classification, PACC is proposed by Bella et al., 2010: Quantification via Probability Estimators. In the multi-class setting, multiple extensions are available.
:softmax(default; our method) improves:softmax_full_regby setting one latent parameter to zero instead of introducing a technical regularization term.:constrainedconstrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:pinvcomputes a pseudo-inverse akin to a minimum-norm constraint, as discussed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:invcomputes the true inverse (if existent) of the transfer matrixM, as proposed by Vucetic & Obradovic, 2001: Classification on data with biased class distribution.:ovrsolves multiple binary one-versus-rest adjustments, as proposed by Forman (2008).:noneyields theCCmethod without any adjustment.:softmax_full_reg(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg(our method) is a variant of:softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.
RUN
QUnfold.RUN — FunctionRUN(transformer; kwargs...)The Regularized Unfolding method by Blobel, 1985: Unfolding methods in high-energy physics experiments.
Keyword arguments
strategy = :softmaxis the solution strategy (see below).τ = 1e-6is the regularization strength for ordinal quantification.n_df = -1(only used ifstrategy==:original) is the effective number of degrees of freedom, required to be0 < n_df <= CwhereCis the number of classes.a = Float64[]are the acceptance factors for unfolding analyses.
Strategies
Blobel's loss function, feature transformation, and regularization can be optimized with multiple strategies.
:softmax(default; our method) improves:softmax_full_regby setting one latent parameter to zero instead of introducing a technical regularization term.:originalis the original, unconstrained Newton optimization proposed by Blobel (1985).:constrainedconstrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:softmax_full_reg(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg(our method) is a variant of:softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.:unconstrained(our method) is similar to:original, but uses a more generic solver.:positive(our method) is:constrainedwithout the sum constraint.
SVD
QUnfold.SVD — FunctionSVD(transformer; kwargs...)The The Singular Value Decomposition-based unfolding method by Hoecker & Kartvelishvili, 1996: SVD approach to data unfolding.
Keyword arguments
strategy = :softmaxis the solution strategy (see below).τ = 1e-6is the regularization strength for ordinal quantification.n_df = -1(only used ifstrategy==:original) is the effective rank, required to be0 < n_df < CwhereCis the number of classes.a = Float64[]are the acceptance factors for unfolding analyses.
Strategies
Hoecker & Kartvelishvili's loss function, feature transformation, and regularization can be optimized with multiple strategies.
:softmax(default; our method) improves:softmax_full_regby setting one latent parameter to zero instead of introducing a technical regularization term.:originalis the original, analytic solution proposed by Hoecker & Kartvelishvili (1996).:constrainedconstrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:softmax_full_reg(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg(our method) is a variant of:softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.:unconstrained(our method) is similar to:original, but uses a more generic solver.
HDx
QUnfold.HDx — TypeHDx(n_bins; kwargs...)The Hellinger Distance-based method on feature histograms by González-Castro et al., 2013: Class distribution estimation based on the Hellinger distance.
The parameter n_bins specifies the number of bins per feature. A regularization strength τ > 0 yields the o-HDx method for ordinal quantification, which is proposed by Bunse et al., 2022: Machine learning for acquiring knowledge in astro-particle physics.
Keyword arguments
strategy = :softmaxis the solution strategy (see below).τ = 0.0is the regularization strength for o-HDx.a = Float64[]are the acceptance factors for unfolding analyses.
Strategies
González-Castro et al.'s loss function and feature transformation can be optimized with multiple strategies.
:softmax(default; our method) improves:softmax_full_regby setting one latent parameter to zero instead of introducing a technical regularization term.:constrainedconstrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:softmax_full_reg(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg(our method) is a variant of:softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.
HDy
QUnfold.HDy — TypeHDy(classifier, n_bins; kwargs...)The Hellinger Distance-based method on prediction histograms by González-Castro et al., 2013: Class distribution estimation based on the Hellinger distance.
The parameter n_bins specifies the number of bins per class. A regularization strength τ > 0 yields the o-HDx method for ordinal quantification, which is proposed by Bunse et al., 2022: Machine learning for acquiring knowledge in astro-particle physics.
Keyword arguments
strategy = :softmaxis the solution strategy (see below).τ = 0.0is the regularization strength for o-HDx.a = Float64[]are the acceptance factors for unfolding analyses.fit_classifier = truewhether or not to fit the givenclassifier.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_scorewhether to useclassifier.oob_decision_function_orclassifier.predict_proba(X)for fittingM.
Strategies
González-Castro et al.'s loss function and feature transformation can be optimized with multiple strategies.
:softmax(default; our method) improves:softmax_full_regby setting one latent parameter to zero instead of introducing a technical regularization term.:constrainedconstrains the optimization to proper probability densities, as proposed by Hopkins & King, 2010: A method of automated nonparametric content analysis for social science.:softmax_full_reg(our method) introduces a soft-max layer, which makes contraints obsolete. This strategy employs a technical regularization term, as proposed by Bunse, 2022: On Multi-Class Extensions of Adjusted Classify and Count.:softmax_reg(our method) is a variant of:softmax, which sets one latent parameter to zero in addition to introducing a technical regularization term.
IBU
QUnfold.IBU — TypeIBU(transformer, n_bins; kwargs...)The Iterative Bayesian Unfolding method by D'Agostini, 1995: A multidimensional unfolding method based on Bayes' theorem.
Keyword arguments
o = 0is the order of the polynomial for ordinal quantification.λ = 0.0is the impact of the polynomial for ordinal quantification.a = Float64[]are the acceptance factors for unfolding analyses.
SLD
QUnfold.SLD — TypeSLD(classifier; kwargs...)The Saerens-Latinne-Decaestecker method, a.k.a. EMQ or Expectation Maximization-based Quantification by Saerens et al., 2002: Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure.
A polynomial order o > 0 and regularization impact λ > 0 yield the o-SLD method for ordinal quantification, which is proposed by Bunse et al., 2022: Machine learning for acquiring knowledge in astro-particle physics.
Keyword arguments
o = 0is the order of the polynomial for o-SLD.λ = 0.0is the impact of the polynomial for o-SLD.a = Float64[]are the acceptance factors for unfolding analyses.fit_classifier = truewhether or not to fit the givenclassifier.
Feature transformations
The unfolding methods RUN, SVD, and IBU have the flexibility of choosing between different feature transformations.
QUnfold.ClassTransformer — TypeClassTransformer(classifier; kwargs...)This transformer yields the classification-based feature transformation used in ACC, PACC, CC, PCC, and SLD.
Keyword arguments
is_probabilistic = falsewhether or not to use posterior predictions.fit_classifier = truewhether or not to fit the givenclassifier.oob_score = hasproperty(classifier, :oob_score) && classifier.oob_scorewhether to useclassifier.oob_decision_function_orclassifier.predict_proba(X)for fittingM.
QUnfold.TreeTransformer — TypeTreeTransformer(tree; kwargs...)This transformer yields a tree-induced partitioning, as proposed by Börner et al., 2017: Measurement/simulation mismatches and multivariate data discretization in the machine learning era.
Keyword arguments
fit_tree = 1.whether or not to fit the giventree. Iffit_treeisfalseor0., do not fit the tree and use all data for fittingM. Iffit_treeistrueor1., fit both the tree andMwith all data. Iffit_treeis between 0 and 1, use a fraction offit_treefor fitting the tree and the remaining fraction1-fit_treefor fittingM.
QUnfold.HistogramTransformer — TypeHistogramTransformer(n_bins; kwargs...)This transformer yields the histogram-based feature transformation used in HDx and HDy. The parameter n_bins specifies the number of bins per input feature.
Keyword arguments
preprocessor = nothingcan be anotherAbstractTransformerthat is called before this transformer.