API

The AbstractMethod defines the interface of all quantification and unfolding algorithms. Most importantly, this interface consists of their fit and predict methods.

class qunfold.AbstractMethod

Abstract base class for quantification methods.

abstractmethod fit(X, y, n_classes=None)

Fit this quantifier to data.

Parameters:
  • X – The feature matrix to which this quantifier will be fitted.

  • y – The labels to which this quantifier will be fitted.

  • n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

This fitted quantifier itself.

abstractmethod predict(X)

Predict the class prevalences in a data set.

Parameters:

X – The feature matrix for which this quantifier will make a prediction.

Returns:

A numpy array of class prevalences.

Instances of many Popular algorithms for quantification and unfolding are created through the corresponding constructors. However, you can also define your own quantification method as a LinearMethod that combines an arbitrary choice of Losses, Regularizers and Data representations.

class qunfold.LinearMethod(loss: ~qunfold.methods.linear.losses.AbstractLoss, representation: ~qunfold.methods.linear.representations.AbstractRepresentation, solver: str = 'trust-ncg', solver_options: ~typing.Dict[str, ~typing.Any] = <factory>, seed: int | None = None)

A generic quantification / unfolding method that predicts class prevalences by solving a system of linear equations.

This class represents any method that consists of a loss function, a data representation, and a regularization term. In this implementation, any regularized loss is minimized through unconstrained second-order minimization. Valid probability estimates are ensured through a soft-max trick by Bunse (2022).

Parameters:
  • loss – An instance from qunfold.methods.linear.losses.

  • representation – An instance from qunfold.methods.linear.representations.

  • solver (optional) – The method argument in scipy.optimize.minimize. Defaults to “trust-ncg”.

  • solver_options (optional) – The options argument in scipy.optimize.minimize. Defaults to {“gtol”: 1e-8, “maxiter”: 1000}.

  • seed (optional) – A random number generator seed from which a numpy RandomState is created. Defaults to None.

Examples

Here, we create the ordinal variant of ACC (Bunse et al., 2023). This variant consists of the original data representation of ACC and of the original loss of ACC, the latter of which is regularized towards smooth solutions.

>>> LinearMethod(
>>>     TikhonovRegularized(LeastSquaresLoss(), 0.01),
>>>     ClassRepresentation(RandomForestClassifier(oob_score=True))
>>> )
fit(X, y, n_classes=None)

Fit this quantifier to data.

Parameters:
  • X – The feature matrix to which this quantifier will be fitted.

  • y – The labels to which this quantifier will be fitted.

  • n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

This fitted quantifier itself.

predict(X)

Predict the class prevalences in a data set.

Parameters:

X – The feature matrix for which this quantifier will make a prediction.

Returns:

A numpy array of class prevalences.

solve(q, M, N=None)

Solve the linear system of equations q=M*p for p.

Parameters:
  • q – A numpy array.

  • M – A numpy matrix.

  • N – The number of data items that q represents. For some losses, this argument is optional.

Returns:

The solution vector p.

Losses

Instances of AbstractLoss provide the loss functions for linear quantification methods. The FunctionLoss, also an abstract class, is a utility for creating such loss functions from JAX function objects.

class qunfold.AbstractLoss

Abstract base class for loss functions and for regularization terms.

abstractmethod instantiate(q, M, N)

This abstract method has to create a lambda expression p -> loss with JAX.

In particular, your implementation of this abstract method should return a lambda expression

>>> return lambda p: loss_value(q, M, p, N)

where loss_value has to return the result of a JAX expression. The JAX requirement ensures that the loss function can be auto-differentiated. Hence, no derivatives of the loss function have to be provided manually. JAX expressions are easy to implement. Just import the numpy wrapper

>>> import jax.numpy as jnp

and use jnp just as if you would use numpy.

Note

p is a vector of class-wise probabilities. This vector will already be the result of our soft-max trick, so that you don’t have to worry about constraints or latent parameters.

Parameters:
  • q – A numpy array.

  • M – A numpy matrix.

  • N – The number of data items that q represents.

Returns:

A lambda expression p -> loss, implemented in JAX.

Examples

The least squares loss, (q - M*p)’ * (q - M*p), is simply

>>> jnp.dot(q - jnp.dot(M, p), q - jnp.dot(M, p))
class qunfold.FunctionLoss(loss_function: Callable)

Create a loss object from a JAX function (p, q, M, N) -> loss_value.

Using this class is likely more convenient than subtyping AbstractLoss. In both cases, the loss_value has to be the result of a JAX expression. The JAX requirement ensures that the loss function can be auto-differentiated. Hence, no derivatives of the loss function have to be provided manually. JAX expressions are easy to implement. Just import the numpy wrapper

>>> import jax.numpy as jnp

and use jnp just as if you would use numpy.

Note

p is a vector of class-wise probabilities. This vector will already be the result of our soft-max trick, so that you don’t have to worry about constraints or latent parameters.

Parameters:

loss_function – A JAX function (p, q, M, N) -> loss_value.

Examples

The least squares loss, (q - M*p)’ * (q - M*p), is simply

>>> def least_squares(p, q, M, N):
>>>     jnp.dot(q - jnp.dot(M, p), q - jnp.dot(M, p))

and thereby ready to be used in a FunctionLoss object:

>>> least_squares_loss = FunctionLoss(least_squares)
instantiate(q, M, N=None)

This abstract method has to create a lambda expression p -> loss with JAX.

In particular, your implementation of this abstract method should return a lambda expression

>>> return lambda p: loss_value(q, M, p, N)

where loss_value has to return the result of a JAX expression. The JAX requirement ensures that the loss function can be auto-differentiated. Hence, no derivatives of the loss function have to be provided manually. JAX expressions are easy to implement. Just import the numpy wrapper

>>> import jax.numpy as jnp

and use jnp just as if you would use numpy.

Note

p is a vector of class-wise probabilities. This vector will already be the result of our soft-max trick, so that you don’t have to worry about constraints or latent parameters.

Parameters:
  • q – A numpy array.

  • M – A numpy matrix.

  • N – The number of data items that q represents.

Returns:

A lambda expression p -> loss, implemented in JAX.

Examples

The least squares loss, (q - M*p)’ * (q - M*p), is simply

>>> jnp.dot(q - jnp.dot(M, p), q - jnp.dot(M, p))

The following concrete sub-classes define the loss functions of existing methods.

class qunfold.LeastSquaresLoss

The loss function of ACC (Forman, 2008), PACC (Bella et al., 2019), and ReadMe (Hopkins & King, 2010).

This loss function computes the sum of squares of element-wise errors between q and M*p.

class qunfold.EnergyLoss

The loss function of EDx (Kawakubo et al., 2016) and EDy (Castaño et al., 2022).

This loss function represents the Energy Distance between two samples.

class qunfold.HellingerSurrogateLoss

The loss function of HDx and HDy (González-Castro et al., 2013).

This loss function computes the average of the squared Hellinger distances between feature-wise (or class-wise) histograms. Note that the original HDx and HDy by González-Castro et al (2013) do not use the squared but the regular Hellinger distance. Their approach is problematic because the regular distance is not always twice differentiable and, hence, complicates numerical optimizations.

class qunfold.BlobelLoss

The loss function of RUN (Blobel, 1985).

This loss function models a likelihood function under the assumption of independent Poisson-distributed elements of q with Poisson rates M*p.

class qunfold.CombinedLoss(*losses, weights=None)

The weighted sum of multiple losses.

Parameters:
  • *losses – An arbitrary number of losses to be added together.

  • weights (optional) – An array of weights which the losses are scaled.

Hint

You can use the CombinedLoss to create arbitrary, weighted sums of losses and regularizers.

Regularizers

qunfold.TikhonovRegularized(loss, tau=0.0)

Add TikhonovRegularization (Blobel, 1985) to any loss.

Calling this function is equivalent to calling

>>> CombinedLoss(loss, TikhonovRegularization(), weights=[1, tau])
Parameters:
  • loss – An instance from qunfold.losses.

  • tau (optional) – The regularization strength. Defaults to 0.

Returns:

An instance of CombinedLoss.

Examples

The regularized loss of RUN (Blobel, 1985) is:

>>> TikhonovRegularization(BlobelLoss(), tau)
class qunfold.TikhonovRegularization

Tikhonov regularization, as proposed by Blobel (1985).

This regularization promotes smooth solutions. This behavior is often required in ordinal quantification and in unfolding problems.

Data representations

Instances of AbstractRepresentation provide the data representations for linear quantification methods.

class qunfold.AbstractRepresentation

Abstract base class for representations.

abstractmethod fit_transform(X, y, average=True, n_classes=None)

This abstract method has to fit the representation and to return the transformed input data.

Note

Implementations of this abstract method should check the sanity of labels by calling check_y(y, n_classes) and they must set the property self.p_trn = class_prevalences(y, n_classes).

Parameters:
  • X – The feature matrix to which this representation will be fitted.

  • y – The labels to which this representation will be fitted.

  • average (optional) – Whether to return a transfer matrix M or a transformation (f(X), y). Defaults to True.

  • n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

A transfer matrix M if average==True or a transformation (f(X), y) if average==False.

abstractmethod transform(X, average=True)

This abstract method has to transform the data X.

Parameters:
  • X – The feature matrix that will be transformed.

  • average (optional) – Whether to return a vector q or a transformation f(X). Defaults to True.

Returns:

A vector q = f(X).mean(axis=0) if average==True or a transformation f(X) if average==False.

The following concrete sub-classes define the representations of existing methods.

class qunfold.ClassRepresentation(classifier: Any, is_probabilistic: bool = False, fit_classifier: bool = True)

A classification-based data representation.

This representation can either be probabilistic (using the posterior predictions of a classifier) or crisp (using the class predictions of a classifier). It is used in ACC, PACC, CC, PCC, and SLD.

Parameters:
  • classifier – A classifier that implements the API of scikit-learn.

  • is_probabilistic (optional) – Whether probabilistic or crisp predictions of the classifier are used to represent the data. Defaults to False.

  • fit_classifier (optional) – Whether to fit the classifier when this quantifier is fitted. Defaults to True.

class qunfold.DistanceRepresentation(metric: str = 'euclidean', preprocessor: AbstractRepresentation | None = None)

A distance-based data representation, as it is used in EDx and EDy.

Parameters:
  • metric (optional) – The metric with which the distance between data items is measured. Can take any value that is accepted by scipy.spatial.distance.cdist. Defaults to “euclidean”.

  • preprocessor (optional) – Another AbstractRepresentation that is called before this representation. Defaults to None.

class qunfold.HistogramRepresentation(n_bins: int, preprocessor: AbstractRepresentation | None = None, unit_scale: bool = True)

A histogram-based data representation, as it is used in HDx and HDy.

Parameters:
  • n_bins – The number of bins in each feature.

  • preprocessor (optional) – Another AbstractRepresentation that is called before this representation. Defaults to None.

  • unit_scale (optional) – Whether or not to scale each output to a sum of one. A value of False indicates that the sum of each output is the number of features. Defaults to True.

class qunfold.EnergyKernelRepresentation(preprocessor: AbstractRepresentation | None = None)

A kernel-based data representation, as it is used in KMM, that uses the energy kernel:

k(x_1, x_2) = ||x_1|| + ||x_2|| - ||x_1 - x_2||

Note

The methods of this representation do not support setting average=False.

Parameters:

preprocessor (optional) – Another AbstractRepresentation that is called before this representation. Defaults to None.

class qunfold.GaussianKernelRepresentation(sigma: float = 1.0, preprocessor: AbstractRepresentation | None = None)

A kernel-based data representation, as it is used in KMM, that uses the gaussian kernel:

k(x, y) = exp(-||x - y||^2 / (2σ^2))

Parameters:
  • sigma (optional) – A smoothing parameter of the kernel function. Defaults to 1.

  • preprocessor (optional) – Another AbstractRepresentation that is called before this representation. Defaults to None.

class qunfold.LaplacianKernelRepresentation(sigma=1.0)

A kernel-based data representation, as it is used in KMM, that uses the laplacian kernel.

Parameters:

sigma (optional) – A smoothing parameter of the kernel function. Defaults to 1.

class qunfold.GaussianRFFKernelRepresentation(sigma: float = 1.0, n_rff: int = 1000, preprocessor: AbstractRepresentation | None = None, seed: int | None = None)

An efficient approximation of the GaussianKernelRepresentation, as it is used in KMM, using random Fourier features.

Parameters:
  • sigma (optional) – A smoothing parameter of the kernel function. Defaults to 1.

  • n_rff (optional) – The number of random Fourier features. Defaults to 1000.

  • preprocessor (optional) – Another AbstractRepresentation that is called before this representation. Defaults to None.

  • seed (optional) – Controls the randomness of the random Fourier features. Defaults to None.

class qunfold.OriginalRepresentation

A dummy representation that simply returns the data as it is.

Utilities

The following classes provide functionalities that go beyond the composition of quantification methods.

Cross-validated training

The qunfold.sklearn module allows you to train classification-based quantification methods through cross-validation. Importing this module requires scikit-learn to be installed.

class qunfold.sklearn.CVClassifier(estimator, n_estimators=5, random_state=None)

Bases: BaseEstimator, ClassifierMixin

An ensemble of classifiers that are trained from cross-validation folds.

All objects of this type have a fixed attribute oob_score = True and, when trained, a fitted attribute self.oob_decision_function_, just like scikit-learn bagging classifiers.

Parameters:
  • estimator – A classifier that implements the API of scikit-learn.

  • n_estimators (optional) – The number of stratified cross-validation folds. Defaults to 5.

  • random_state (optional) – The random state for stratification. Defaults to None.

Examples

Here, we create an instance of ACC that trains a logistic regression classifier with 10 cross-validation folds.

>>> ACC(CVClassifier(LogisticRegression(), 10))
fit(X, y)
predict(X)
predict_proba(X)

Hint

If you use a bagging classifier (like random forests) with oob_score=True, you do not need to use cross-validation. Instead, the quantification method is then trained on the out-of-bag predictions of the bagging classifier.

QuaPy

Deprecation notice: The former qunfold.quapy module has been moved to QuaPy. Please consult the documentation of quapy.method.composable for integrating qunfold with QuaPy.