API reference

The following is a list of all public methods in CherenkovDeconvolution.jl.

Deconvolution methods

All deconvolution methods implement the deconvolve function.

CherenkovDeconvolution.Methods.deconvolveFunction
deconvolve(m, X_obs, X_trn, y_trn)
deconvolve(prefit(m, X_trn, y_trn), X_obs)

Deconvolve the observed features in X_obs with the deconvolution method m trained on the features X_trn and the corresponding labels y_trn.

See also: prefit.

source
CherenkovDeconvolution.Methods.prefitFunction
prefit(m, X_trn, y_trn)

Return a copy of the deconvolution method m which is already trained on the features X_trn and the corresponding labels y_trn.

See also: deconvolve.

source
CherenkovDeconvolution.Methods.DSEAType
DSEA(classifier; kwargs...)

The DSEA/DSEA+ deconvolution method, embedding the given classifier.

Keyword arguments

  • f_0 = ones(m) ./ m defines the prior, which is uniform by default
  • fixweighting = true sets, whether or not the weight update fix is applied. This fix is proposed in my Master's thesis and in the corresponding paper.
  • stepsize = DEFAULT_STEPSIZE is the step size taken in every iteration.
  • smoothing = NoSmoothing() is an object that optionally applies smoothing in between iterations.
  • K = 1 is the maximum number of iterations.
  • epsilon = 0.0 is the minimum symmetric Chi Square distance between iterations. If the actual distance is below this threshold, convergence is assumed and the algorithm stops.
  • inspect = nothing is a function (f_k::Vector, k::Int, chi2s::Float64, alpha_k::Float64) -> Any optionally called in every iteration.
  • return_contributions = false sets, whether or not the contributions of individual examples in X_obs are returned as a tuple together with the deconvolution result.
source
CherenkovDeconvolution.Methods.IBUType
IBU(binning; kwargs...)

The Iterative Bayesian Unfolding deconvolution method, using a binning to discretize the observable features.

Keyword arguments

  • f_0 = ones(m) ./ m defines the prior, which is uniform by default.
  • smoothing = NoSmoothing() is an object that optionally applies smoothing in between iterations. The operation is neither applied to the initial prior, nor to the final result. The function inspect is called before the smoothing is performed.
  • K = 3 is the maximum number of iterations.
  • epsilon = 0.0 is the minimum symmetric Chi Square distance between iterations. If the actual distance is below this threshold, convergence is assumed and the algorithm stops.
  • stepsize = DEFAULT_STEPSIZE is the step size taken in every iteration.
  • inspect = nothing is a function (f_k::Vector, k::Int, chi2s::Float64, alpha_k::Float64) -> Any optionally called in every iteration.
  • warn = true determines whether warnings about negative values are emitted during normalization.
  • fit_ratios = false (discouraged) determines if ratios are fitted (i.e. R has to contain counts so that the ratio f_est / f_train is estimated) or if the probability density f_est is fitted directly.
source
CherenkovDeconvolution.Methods.PRUNType
PRUN(binning; kwargs...)

A version of the Regularized Unfolding method that is constrained to positive results. Like the original version, it uses a binning to discretize the observable features.

Keyword arguments

  • tau = 0.0 determines the regularisation strength.
  • K = 100 is the maximum number of iterations.
  • epsilon = 1e-6 is the minimum difference in the loss function between iterations. RUN stops when the absolute loss difference drops below epsilon.
  • f_0 = ones(size(R, 2)) Starting point for the interior-point Newton optimization.
  • acceptance_correction = nothing is a tuple of functions (ac(d), invac(d)) representing the acceptance correction ac and its inverse operation invac for a data set d.
  • ac_regularisation = true decides whether acceptance correction is taken into account for regularisation. Requires acceptance_correction != nothing.
  • log_constant = 1/18394 is a selectable constant used in log regularisation to prevent the undefined case log(0).
  • inspect = nothing is a function (f_k::Vector, k::Int, ldiff::Float64) -> Any called in each iteration.
  • warn = true determines whether warnings about negative values are emitted during normalization.
  • fit_ratios = false (discouraged) determines if ratios are fitted (i.e. R has to contain counts so that the ratio f_est / f_train is estimated) or if the probability density f_est is fitted directly.
source
CherenkovDeconvolution.Methods.RUNType
RUN(binning; kwargs...)

The Regularized Unfolding method, using a binning to discretize the observable features.

Keyword arguments

  • n_df = size(R, 2) is the effective number of degrees of freedom. The default n_df results in no regularization (there is one degree of freedom for each dimension in the result).
  • K = 100 is the maximum number of iterations.
  • epsilon = 1e-6 is the minimum difference in the loss function between iterations. RUN stops when the absolute loss difference drops below epsilon.
  • acceptance_correction = nothing is a tuple of functions (ac(d), invac(d)) representing the acceptance correction ac and its inverse operation invac for a data set d.
  • ac_regularisation = true decides whether acceptance correction is taken into account for regularisation. Requires acceptance_correction != nothing.
  • log_constant = 1/18394 is a selectable constant used in log regularisation to prevent the undefined case log(0).
  • inspect = nothing is a function (f_k::Vector, k::Int, ldiff::Float64, tau::Float64) -> Any optionally called in every iteration.
  • warn = true determines whether warnings about negative values are emitted during normalization.
  • fit_ratios = false (discouraged) determines if ratios are fitted (i.e. R has to contain counts so that the ratio f_est / f_train is estimated) or if the probability density f_est is fitted directly.
source
CherenkovDeconvolution.Methods.SVDType
SVD(binning; kwargs...)

The SVD-based deconvolution method, using a binning to discretize the observable features.

Keyword arguments

  • effective_rank = -1 is a regularization parameter which defines the effective rank of the solution. This rank must be <= dim(f). Any value smaller than one results turns off regularization.
  • N = sum(g) is the number of observations.
  • B = DeconvUtil.cov_Poisson(g, N) is the varianca-covariance matrix of the observed bins. The default value represents the assumption that each observed bin is Poisson-distributed with rate g[i]*N.
  • epsilon_C = 1e-3 is a small constant to be added to each diagonal entry of the regularization matrix C. If no such constant would be added, inversion of C would not be possible.
  • fit_ratios = true determines if ratios are fitted (i.e. R has to contain counts so that the ratio f_est / f_train is estimated) or if the probability density f_est is fitted directly.
  • warn = true determines whether warnings about negative values are emitted during normalization.
source

Binnings

Binnings are needed by the classical (discrete) deconvolution algorithms, e.g. IBU, PRUN, RUN, and SVD.

CherenkovDeconvolution.Binnings.TreeBinningType
TreeBinning(J, [preprocessor]; kwargs...)

A supervised tree binning strategy with an optional preprocessor and up to J clusters.

Keyword arguments

  • criterion = "gini" is the splitting criterion of the tree.
  • seed = rand(UInt32) is the random seed for tie breaking.
source

Smoothings

Smoothings can regularize intermediate estimates, e.g. in IBU.

CherenkovDeconvolution.Smoothings.PolynomialSmoothingType
PolynomialSmoothing(order)

Intermediate priors are smoothed with a polynomial of the given order.

  • impact = 1.0 linearly interpolate between the smoothed and the actual prior if 0 < impact < 1 (default: use smoothed version).
  • avg_negative = true replace negative values with the average of neighboring bins, as proposed in [dagostini2010improved]
  • warn = true specifies if a warnings about negative values are emitted
source

Stepsizes

Stepsizes can be used in DSEA and IBU. Combining the RunStepsize with DSEA yields the DSEA+ version of the algorithm. More information on stepsizes is given in the Manual.

CherenkovDeconvolution.OptimizedStepsizes.RunStepsizeType
RunStepsize(binning; kwargs...)

Adapt the step size by maximizing the likelihood of the next estimate in the search direction of the current iteration, much like in the RUN deconvolution method.

Keyword arguments:

  • decay = false specifies whether a_k+1 <= a_k is enforced so that step sizes never increase.
  • tau = 0.0 determines the regularisation strength.
  • warn = false specifies whether warnings should be emitted for debugging purposes.
source
CherenkovDeconvolution.OptimizedStepsizes.LsqStepsizeType
LsqStepsize(binning; kwargs...)

Adapt the step size by solving a least squares objective in the search direction of the current iteration.

Keyword arguments:

  • decay = false specifies whether a_k+1 <= a_k is enforced so that step sizes never increase.
  • tau = 0.0 determines the regularisation strength.
  • warn = false specifies whether warnings should be emitted for debugging purposes.
source

DeconvUtil

The module DeconvUtil provides a rich set of user-level ulitity functions. We do not export the members of this module directly, so that you need to name the module when using its functions.

using CherenkovDeconvolution
fit_pdf([.3, .4, .3]) # WILL BREAK

# solution a)
DeconvUtil.fit_pdf([.3, .4, .3])

# solution b)
import DeconvUtil: fit_pdf
fit_pdf([.3, .4, .3])
CherenkovDeconvolution.DeconvUtil.fit_pdfFunction
fit_pdf(x[, bins]; normalize=true, laplace=false)

Obtain the discrete pdf of the integer array x, optionally specifying the array of bins.

The result is normalized by default. If it is not normalized now, you can do so later by calling DeconvUtil.normalizepdf.

Laplace correction means that at least one example is assumed in every bin, so that no bin has probability zero. This feature is disabled by default.

source
CherenkovDeconvolution.DeconvUtil.fit_RFunction
fit_R(y, x; bins_y, bins_x, normalize=true)

Estimate the detector response matrix R, which empirically captures the transfer from the integer array y to the integer array x.

R is normalized by default so that fit_pdf(x) == R * fit_pdf(y). If R is not normalized now, you can do so later calling DeconvUtil.normalizetransfer(R).

source
CherenkovDeconvolution.DeconvUtil.normalizepdfFunction
normalizepdf(array...; warn=true)
normalizepdf!(array...; warn=true)

Normalize each array to a discrete probability density function.

By default, warn if coping with NaNs, Infs, or negative values.

source

Developer interface

The following list of methods is primarily intended for developers who wish to implement their own deconvolution methods, binnings, stepsizes, etc. If you do so, please file a pull request so that others can benefit from your work! More information on how to develop for this package is given in the Developer manual.

Discretizers.encodeFunction
encode(d::TreeDiscretizer, X_obs)

Discretize X_obs using the leaf indices in the decision tree of d as discrete values.

source
encode(d::KMeansDiscretizer, X_obs)

Discretize X_obs using the cluster indices of d as discrete values.

source
CherenkovDeconvolution.Stepsizes.valueFunction
value(s, k, p, f, a)

Use the Stepsize object s to compute a step size for iteration number k with the search direction p, the previous estimate f, and the previous step size a.

See also: ConstantStepsize, RunStepsize, LsqStepsize, ExpDecayStepsize, MulDecayStepsize.

source
CherenkovDeconvolution.Methods.LabelSanitizerType
LabelSanitizer(y_trn, n_bins=expected_n_bins_y(y_trn))

A sanitizer that

  • encodes labels and priors so that none of the resulting bins is empty.
  • decodes deconvolution results to recover the original (possibly empty) bins.

See also: encode_labels, encode_prior, decode_estimate.

source