API reference
The following is a list of all public methods in CherenkovDeconvolution.jl.
Deconvolution methods
All deconvolution methods implement the deconvolve
function.
CherenkovDeconvolution.Methods.deconvolve
— Functiondeconvolve(m, X_obs, X_trn, y_trn)
deconvolve(prefit(m, X_trn, y_trn), X_obs)
Deconvolve the observed features in X_obs
with the deconvolution method m
trained on the features X_trn
and the corresponding labels y_trn
.
See also: prefit
.
CherenkovDeconvolution.Methods.prefit
— Functionprefit(m, X_trn, y_trn)
Return a copy of the deconvolution method m
which is already trained on the features X_trn
and the corresponding labels y_trn
.
See also: deconvolve
.
CherenkovDeconvolution.Methods.DSEA
— TypeDSEA(classifier; kwargs...)
The DSEA/DSEA+ deconvolution method, embedding the given classifier
.
Keyword arguments
f_0 = ones(m) ./ m
defines the prior, which is uniform by defaultfixweighting = true
sets, whether or not the weight update fix is applied. This fix is proposed in my Master's thesis and in the corresponding paper.stepsize = DEFAULT_STEPSIZE
is the step size taken in every iteration.smoothing = NoSmoothing()
is an object that optionally applies smoothing in between iterations.K = 1
is the maximum number of iterations.epsilon = 0.0
is the minimum symmetric Chi Square distance between iterations. If the actual distance is below this threshold, convergence is assumed and the algorithm stops.inspect = nothing
is a function(f_k::Vector, k::Int, chi2s::Float64, alpha_k::Float64) -> Any
optionally called in every iteration.return_contributions = false
sets, whether or not the contributions of individual examples inX_obs
are returned as a tuple together with the deconvolution result.
CherenkovDeconvolution.Methods.IBU
— TypeIBU(binning; kwargs...)
The Iterative Bayesian Unfolding deconvolution method, using a binning
to discretize the observable features.
Keyword arguments
f_0 = ones(m) ./ m
defines the prior, which is uniform by default.smoothing = NoSmoothing()
is an object that optionally applies smoothing in between iterations. The operation is neither applied to the initial prior, nor to the final result. The functioninspect
is called before the smoothing is performed.K = 3
is the maximum number of iterations.epsilon = 0.0
is the minimum symmetric Chi Square distance between iterations. If the actual distance is below this threshold, convergence is assumed and the algorithm stops.stepsize = DEFAULT_STEPSIZE
is the step size taken in every iteration.inspect = nothing
is a function(f_k::Vector, k::Int, chi2s::Float64, alpha_k::Float64) -> Any
optionally called in every iteration.warn = true
determines whether warnings about negative values are emitted during normalization.fit_ratios = false
(discouraged) determines if ratios are fitted (i.e.R
has to contain counts so that the ratiof_est / f_train
is estimated) or if the probability densityf_est
is fitted directly.
CherenkovDeconvolution.Methods.PRUN
— TypePRUN(binning; kwargs...)
A version of the Regularized Unfolding method that is constrained to positive results. Like the original version, it uses a binning
to discretize the observable features.
Keyword arguments
tau = 0.0
determines the regularisation strength.K = 100
is the maximum number of iterations.epsilon = 1e-6
is the minimum difference in the loss function between iterations. RUN stops when the absolute loss difference drops belowepsilon
.f_0 = ones(size(R, 2))
Starting point for the interior-point Newton optimization.acceptance_correction = nothing
is a tuple of functions (ac(d), invac(d)) representing the acceptance correction ac and its inverse operation invac for a data set d.ac_regularisation = true
decides whether acceptance correction is taken into account for regularisation. Requiresacceptance_correction
!= nothing.log_constant = 1/18394
is a selectable constant used in log regularisation to prevent the undefined case log(0).inspect = nothing
is a function(f_k::Vector, k::Int, ldiff::Float64) -> Any
called in each iteration.warn = true
determines whether warnings about negative values are emitted during normalization.fit_ratios = false
(discouraged) determines if ratios are fitted (i.e.R
has to contain counts so that the ratiof_est / f_train
is estimated) or if the probability densityf_est
is fitted directly.
CherenkovDeconvolution.Methods.RUN
— TypeRUN(binning; kwargs...)
The Regularized Unfolding method, using a binning
to discretize the observable features.
Keyword arguments
n_df = size(R, 2)
is the effective number of degrees of freedom. The defaultn_df
results in no regularization (there is one degree of freedom for each dimension in the result).K = 100
is the maximum number of iterations.epsilon = 1e-6
is the minimum difference in the loss function between iterations. RUN stops when the absolute loss difference drops belowepsilon
.acceptance_correction = nothing
is a tuple of functions (ac(d), invac(d)) representing the acceptance correction ac and its inverse operation invac for a data set d.ac_regularisation = true
decides whether acceptance correction is taken into account for regularisation. Requiresacceptance_correction
!= nothing.log_constant = 1/18394
is a selectable constant used in log regularisation to prevent the undefined case log(0).inspect = nothing
is a function(f_k::Vector, k::Int, ldiff::Float64, tau::Float64) -> Any
optionally called in every iteration.warn = true
determines whether warnings about negative values are emitted during normalization.fit_ratios = false
(discouraged) determines if ratios are fitted (i.e.R
has to contain counts so that the ratiof_est / f_train
is estimated) or if the probability densityf_est
is fitted directly.
CherenkovDeconvolution.Methods.SVD
— TypeSVD(binning; kwargs...)
The SVD-based deconvolution method, using a binning
to discretize the observable features.
Keyword arguments
effective_rank = -1
is a regularization parameter which defines the effective rank of the solution. This rank must be <= dim(f). Any value smaller than one results turns off regularization.N = sum(g)
is the number of observations.B = DeconvUtil.cov_Poisson(g, N)
is the varianca-covariance matrix of the observed bins. The default value represents the assumption that each observed bin is Poisson-distributed with rateg[i]*N
.epsilon_C = 1e-3
is a small constant to be added to each diagonal entry of the regularization matrixC
. If no such constant would be added, inversion ofC
would not be possible.fit_ratios = true
determines if ratios are fitted (i.e.R
has to contain counts so that the ratiof_est / f_train
is estimated) or if the probability densityf_est
is fitted directly.warn = true
determines whether warnings about negative values are emitted during normalization.
Binnings
Binnings are needed by the classical (discrete) deconvolution algorithms, e.g. IBU
, PRUN
, RUN
, and SVD
.
CherenkovDeconvolution.Binnings.TreeBinning
— TypeTreeBinning(J, [preprocessor]; kwargs...)
A supervised tree binning strategy with an optional preprocessor
and up to J
clusters.
Keyword arguments
criterion = "gini"
is the splitting criterion of the tree.seed = rand(UInt32)
is the random seed for tie breaking.
CherenkovDeconvolution.Binnings.KMeansBinning
— TypeKMeansBinning(J, [preprocessor]; seed=rand(UInt32))
An unsupervised binning strategy with an optional preprocessor
and up to J
clusters.
CherenkovDeconvolution.Binnings.ClassificationPreprocessor
— TypeClassificationPreprocessor(classifier)
The output of a classifier
is used as the input of the actual Binning
.
CherenkovDeconvolution.Binnings.DefaultPreprocessor
— Typetype DefaultPreprocessor <: BinningPreprocessor
A default preprocessor that does not transform the data.
Smoothings
Smoothings can regularize intermediate estimates, e.g. in IBU
.
CherenkovDeconvolution.Smoothings.NoSmoothing
— TypeNoSmoothing()
No smoothing; return the intermediate prior as it is.
CherenkovDeconvolution.Smoothings.PolynomialSmoothing
— TypePolynomialSmoothing(order)
Intermediate priors are smoothed with a polynomial of the given order
.
impact = 1.0
linearly interpolate between the smoothed and the actual prior if0 < impact < 1
(default: use smoothed version).avg_negative = true
replace negative values with the average of neighboring bins, as proposed in [dagostini2010improved]warn = true
specifies if a warnings about negative values are emitted
Stepsizes
Stepsizes can be used in DSEA
and IBU
. Combining the RunStepsize
with DSEA
yields the DSEA+ version of the algorithm. More information on stepsizes is given in the Manual.
CherenkovDeconvolution.OptimizedStepsizes.RunStepsize
— TypeRunStepsize(binning; kwargs...)
Adapt the step size by maximizing the likelihood of the next estimate in the search direction of the current iteration, much like in the RUN
deconvolution method.
Keyword arguments:
decay = false
specifies whethera_k+1 <= a_k
is enforced so that step sizes never increase.tau = 0.0
determines the regularisation strength.warn = false
specifies whether warnings should be emitted for debugging purposes.
CherenkovDeconvolution.OptimizedStepsizes.LsqStepsize
— TypeLsqStepsize(binning; kwargs...)
Adapt the step size by solving a least squares objective in the search direction of the current iteration.
Keyword arguments:
decay = false
specifies whethera_k+1 <= a_k
is enforced so that step sizes never increase.tau = 0.0
determines the regularisation strength.warn = false
specifies whether warnings should be emitted for debugging purposes.
CherenkovDeconvolution.Stepsizes.ConstantStepsize
— TypeConstantStepsize(alpha)
Choose the constant step size alpha
in every iteration.
CherenkovDeconvolution.Stepsizes.MulDecayStepsize
— TypeMulDecayStepsize(eta, a=1.0)
Reduce the first stepsize a
by eta
in each iteration:
value(MulDecayStepsize(eta, a), k, ...) == a * k^(eta-1)
CherenkovDeconvolution.Stepsizes.ExpDecayStepsize
— TypeExpDecayStepsize(eta, a=1.0)
Reduce the first stepsize a
by eta
in each iteration:
value(ExpDecayStepsize(eta, a), k, ...) == a * eta^(k-1)
CherenkovDeconvolution.Stepsizes.DEFAULT_STEPSIZE
— Constantconst DEFAULT_STEPSIZE = ConstantStepsize(1.0)
The default stepsize in all deconvolution methods.
DeconvUtil
The module DeconvUtil
provides a rich set of user-level ulitity functions. We do not export the members of this module directly, so that you need to name the module when using its functions.
using CherenkovDeconvolution
fit_pdf([.3, .4, .3]) # WILL BREAK
# solution a)
DeconvUtil.fit_pdf([.3, .4, .3])
# solution b)
import DeconvUtil: fit_pdf
fit_pdf([.3, .4, .3])
CherenkovDeconvolution.DeconvUtil.fit_pdf
— Functionfit_pdf(x[, bins]; normalize=true, laplace=false)
Obtain the discrete pdf of the integer array x
, optionally specifying the array of bins
.
The result is normalized by default. If it is not normalized now, you can do so later by calling DeconvUtil.normalizepdf
.
Laplace correction means that at least one example is assumed in every bin, so that no bin has probability zero. This feature is disabled by default.
CherenkovDeconvolution.DeconvUtil.fit_R
— Functionfit_R(y, x; bins_y, bins_x, normalize=true)
Estimate the detector response matrix R
, which empirically captures the transfer from the integer array y
to the integer array x
.
R
is normalized by default so that fit_pdf(x) == R * fit_pdf(y)
. If R
is not normalized now, you can do so later calling DeconvUtil.normalizetransfer(R)
.
CherenkovDeconvolution.DeconvUtil.normalizetransfer
— Functionnormalizetransfer(R[; warn=true])
Normalize each column in R
to make a probability density function.
CherenkovDeconvolution.DeconvUtil.normalizepdf
— Functionnormalizepdf(array...; warn=true)
normalizepdf!(array...; warn=true)
Normalize each array to a discrete probability density function.
By default, warn
if coping with NaNs, Infs, or negative values.
CherenkovDeconvolution.DeconvUtil.normalizepdf!
— Functionnormalizepdf(array...; warn=true)
normalizepdf!(array...; warn=true)
Normalize each array to a discrete probability density function.
By default, warn
if coping with NaNs, Infs, or negative values.
Developer interface
The following list of methods is primarily intended for developers who wish to implement their own deconvolution methods, binnings, stepsizes, etc. If you do so, please file a pull request so that others can benefit from your work! More information on how to develop for this package is given in the Developer manual.
CherenkovDeconvolution.Methods.DeconvolutionMethod
— Typeabstract type DeconvolutionMethod
The supertype of all deconvolution methods.
CherenkovDeconvolution.Methods.DiscreteMethod
— Typeabstract type DiscreteMethod <: DeconvolutionMethod
The supertype of all classical deconvolution methods which estimate the density function f
from a transfer matrix R
and an observed density g
.
CherenkovDeconvolution.Binnings.Binning
— Typeabstract type Binning
Supertype of all binning strategies for observable features.
CherenkovDeconvolution.Binnings.BinningDiscretizer
— Typeabstract type BinningDiscretizer
Supertype of any clustering-based discretizer mapping from an n-dimensional space to a single cluster index dimension.
CherenkovDeconvolution.Binnings.bins
— Functionbins(d::T) where T <: BinningDiscretizer
Return the bin indices of d
.
Discretizers.encode
— Functionencode(d::TreeDiscretizer, X_obs)
Discretize X_obs
using the leaf indices in the decision tree of d
as discrete values.
encode(d::KMeansDiscretizer, X_obs)
Discretize X_obs
using the cluster indices of d
as discrete values.
CherenkovDeconvolution.Stepsizes.Stepsize
— Typeabstract type stepsize end
Abstract supertype for step sizes in deconvolution.
See also: stepsize
.
CherenkovDeconvolution.OptimizedStepsizes.OptimizedStepsize
— TypeOptimizedStepsize(objective, decay)
A step size that is optimized over an objective
function. If decay=true
, then the step sizes never increase.
See also: RunStepsize
, LsqStepsize
.
CherenkovDeconvolution.Stepsizes.initialize_prefit!
— Functioninitialize_prefit!(s, X_trn, y_trn)
Prepare the stepsize strategy s
with the training set (X_trn, y_trn)
.
See also: initialize_deconvolve!
.
CherenkovDeconvolution.Stepsizes.initialize_deconvolve!
— Functioninitialize_deconvolve!(s, X_obs)
Prepare the stepsize strategy s
with the observed features in X_obs
.
See also: initialize_prefit!
.
CherenkovDeconvolution.Stepsizes.value
— Functionvalue(s, k, p, f, a)
Use the Stepsize
object s
to compute a step size for iteration number k
with the search direction p
, the previous estimate f
, and the previous step size a
.
See also: ConstantStepsize
, RunStepsize
, LsqStepsize
, ExpDecayStepsize
, MulDecayStepsize
.
CherenkovDeconvolution.Methods.check_prior
— Functioncheck_prior(f_0, n_bins)
Throw meaningful exceptions if the input prior of a deconvolution run is defective.
CherenkovDeconvolution.Methods.check_arguments
— Functioncheck_arguments(X_trn, y_trn)
Throw meaningful exceptions if the input data of a deconvolution run is defective.
CherenkovDeconvolution.Methods.LoneClassException
— TypeLoneClassException(label)
An exception thrown by check_arguments
when only one class is in the training set.
See also: recover_estimate
CherenkovDeconvolution.Methods.recover_estimate
— Functionrecover_estimate(x::LoneClassException, n_bins=1)
Recover a trivial deconvolution result from x
, in which all bins are zero, except for the one that occured in the training set.
CherenkovDeconvolution.Methods.LabelSanitizer
— TypeLabelSanitizer(y_trn, n_bins=expected_n_bins_y(y_trn))
A sanitizer that
- encodes labels and priors so that none of the resulting bins is empty.
- decodes deconvolution results to recover the original (possibly empty) bins.
See also: encode_labels
, encode_prior
, decode_estimate
.
CherenkovDeconvolution.Methods.encode_labels
— Functionencode_labels(s::LabelSanitizer, y_trn)
Encode the labels y_trn
so that all values from 1
to max(y_trn)
occur.
See also: encode_prior
, decode_estimate
.
CherenkovDeconvolution.Methods.encode_prior
— Functionencode_prior(s::LabelSanitizer, f_0)
Encode the prior f_0
to be consistent with the encoded labels.
See also: encode_labels
, decode_estimate
.
CherenkovDeconvolution.Methods.decode_estimate
— Functiondecode_estimate(s::LabelSanitizer, f)
Recover the original bins in a deconvolution result f
after encoding the labels.
See also: encode_labels
, encode_prior
.