Multivariate optimisation of signal efficiency for given background efficiency, applying rectangular minimum and maximum requirements.
Also implemented is a "decorrelate/diagonlized cuts approach", which improves over the uncorrelated cuts ansatz by transforming linearly the input variables into a diagonal space, using the square-root of the covariance matrix.
Other optimisation criteria, such as maximising the signal significance- squared, S^2/(S+B), with S and B being the signal and background yields, correspond to a particular point in the optimised background rejection versus signal efficiency curve. This working point requires the knowledge of the expected yields, which is not the case in general. Note also that for rare signals, Poissonian statistics should be used, which modifies the significance criterion.
The rectangular cut of a volume in the variable space is performed using a binary tree to sort the training events. This provides a significant reduction in computing time (up to several orders of magnitudes, depending on the complexity of the problem at hand).
Technically, optimisation is achieved in TMVA by two methods:
Attempts to use Minuit fits (Simplex ot Migrad) instead have not shown superior results, and often failed due to convergence at local minima.
The tests we have performed so far showed that in generic applications, the GA is superior to MC sampling, and hence GA is the default method. It is worthwhile trying both anyway. Decorrelated (or "diagonalized") Cuts
See class description for Method Likelihood for a detailed explanation.
void | CreateVariablePDFs() |
void | GetEffsfromPDFs(Double_t* cutMin, Double_t* cutMax, Double_t& effS, Double_t& effB) |
void | GetEffsfromSelection(Double_t* cutMin, Double_t* cutMax, Double_t& effS, Double_t& effB) |
void | InitCuts() |
void | MatchCutsToPars(vector<Double_t>&, Double_t*, Double_t*) |
void | MatchCutsToPars(vector<Double_t>&, Double_t**, Double_t**, Int_t ibin) |
void | MatchParsToCuts(const vector<Double_t>&, Double_t*, Double_t*) |
void | MatchParsToCuts(Double_t*, Double_t*, Double_t*) |
Bool_t | SanityChecks() |
enum EFitMethodType { | kUseMonteCarlo | |
kUseGeneticAlgorithm | ||
kUseSimulatedAnnealing | ||
kUseMinuit | ||
kUseEventScan | ||
kUseMonteCarloEvents | ||
}; | ||
enum EEffMethod { | kUseEventSelection | |
kUsePDFs | ||
}; | ||
enum EFitParameters { | kNotEnforced | |
kForceMin | ||
kForceMax | ||
kForceSmart | ||
kForceVerySmart | ||
}; | ||
enum TMVA::MethodBase::EWeightFileType { | kROOT | |
kTEXT | ||
}; | ||
enum TMVA::MethodBase::ECutOrientation { | kNegative | |
kPositive | ||
}; | ||
enum TObject::EStatusBits { | kCanDelete | |
kMustCleanup | ||
kObjInCanvas | ||
kIsReferenced | ||
kHasUUID | ||
kCannotPick | ||
kNoContextMenu | ||
kInvalidObject | ||
}; | ||
enum TObject::[unnamed] { | kIsOnHeap | |
kNotDeleted | ||
kZombie | ||
kBitMask | ||
kSingleKey | ||
kOverwrite | ||
kWriteDelete | ||
}; | ||
enum TObject::EStatusBits { | kCanDelete | |
kMustCleanup | ||
kObjInCanvas | ||
kIsReferenced | ||
kHasUUID | ||
kCannotPick | ||
kNoContextMenu | ||
kInvalidObject | ||
}; | ||
enum TObject::[unnamed] { | kIsOnHeap | |
kNotDeleted | ||
kZombie | ||
kBitMask | ||
kSingleKey | ||
kOverwrite | ||
kWriteDelete | ||
}; |
TMVA::MsgLogger | TMVA::Configurable::fLogger | message logger |
static const Double_t | fgMaxAbsCutVal |
vector<TString>* | TMVA::MethodBase::fInputVars | vector of input variables used in MVA |
TMVA::MsgLogger | TMVA::MethodBase::fLogger | message logger |
Int_t | TMVA::MethodBase::fNbins | number of bins in representative histograms |
Int_t | TMVA::MethodBase::fNbinsH | number of bins in evaluation histograms |
TMVA::Ranking* | TMVA::MethodBase::fRanking | pointer to ranking object (created by derived classifiers) |
TString* | fAllVarsI | what to do with variables |
TMVA::BinarySearchTree* | fBinaryTreeB | |
TMVA::BinarySearchTree* | fBinaryTreeS | |
Double_t** | fCutMax | maximum requirement |
Double_t** | fCutMin | minimum requirement |
TMVA::MethodCuts::vector<Interval*> | fCutRange | allowed ranges for cut optimisation |
Double_t* | fCutRangeMax | maximum of allowed cut range |
Double_t* | fCutRangeMin | minimum of allowed cut range |
TH1* | fEffBvsSLocal | intermediate eff. background versus eff signal histo |
TMVA::MethodCuts::EEffMethod | fEffMethod | chosen efficiency calculation method |
TString | fEffMethodS | chosen efficiency calculation method (string) |
Double_t | fEffRef | reference efficiency |
TMVA::MethodCuts::EFitMethodType | fFitMethod | chosen fit method |
TString | fFitMethodS | chosen fit method (string) |
vector<EFitParameters>* | fFitParams | vector for series of fit methods |
vector<Double_t>* | fMeanB | means of variables (background) |
vector<Double_t>* | fMeanS | means of variables (signal) |
Int_t | fNRandCuts | number of random cut samplings |
Int_t | fNpar | number of parameters in fit (default: 2*Nvar) |
TRandom* | fRandom | random generator for MC optimisation method |
vector<Int_t>* | fRangeSign | used to match cuts to fit parameters (and vice versa) |
vector<Double_t>* | fRmsB | RMSs of variables (background) |
vector<Double_t>* | fRmsS | RMSs of variables (signal) |
Double_t | fTestSignalEff | used to test optimized signal efficiency |
Double_t* | fTmpCutMax | temporary maximum requirement |
Double_t* | fTmpCutMin | temporary minimum requirement |
vector<TH1*>* | fVarHistB | reference histograms (background) |
vector<TH1*>* | fVarHistB_smooth | smoothed reference histograms (background) |
vector<TH1*>* | fVarHistS | reference histograms (signal) |
vector<TH1*>* | fVarHistS_smooth | smoothed reference histograms (signal) |
vector<PDF*>* | fVarPdfB | reference PDFs (background) |
vector<PDF*>* | fVarPdfS | reference PDFs (signal) |
standard constructor see below for option string format
construction from weight file
define the options (their key words) that can be set in the option string know options: Method <string> Minimization method available values are: MC Monte Carlo <default> GA Genetic Algorithm SA Simulated annealing EffMethod <string> Efficiency selection method available values are: EffSel <default> EffPDF VarProp <string> Property of variable 1 for the MC method (taking precedence over the globale setting. The same values as for the global option are available. Variables 1..10 can be set this way CutRangeMin/Max <float> user-defined ranges in which cuts are varied
retrieve cut values for given signal efficiency assume vector of correct size !!
retrieve cut values for given signal efficiency
returns estimator for "cut fitness" used by GA
there are two requirements:
1) the signal efficiency must be equal to the required one in the
efficiency scan
2) the background efficiency must be as small as possible
the requirement 1) has priority over 2)
translates parameters into cuts
translate the cuts into parameters
compute signal and background efficiencies from PDFs for given cut sample
compute signal and background efficiencies from event counting for given cut sample
- overloaded function to create background efficiency (rejection) versus signal efficiency plot (first call of this function) - the function returns the signal efficiency at background efficiency indicated in theString "theString" must have two entries: [0]: "Efficiency" [1]: the value of background efficiency at which the signal efficiency is to be returned
- overloaded function to create background efficiency (rejection) versus signal efficiency plot (first call of this function) - the function returns the signal efficiency at background efficiency indicated in theString "theString" must have two entries: [0]: "Efficiency" [1]: the value of background efficiency at which the signal efficiency is to be returned
get help message text
typical length of text line:
"|--------------------------------------------------------------|"
rarity distributions (signal or background (default) is uniform in [0,1])
{ return 0; }
the definition of fit parameters can be different from the actual cut requirements; these functions provide the matching