TUnfold solves the inverse problem chi**2 = 1/2 * (y-Ax)# V (y-Ax) + tau (L(x-x0))# L(x-x0) where # means that the matrix is transposed Monte Carlo input y: vector of measured quantities (dimension ny) V: inverse of covariance matrix for y (dimension ny x ny) in many cases V is diagonal and calculated from the errors of y A: migration matrix (dimension ny x nx) x: unknown underlying distribution (dimension nx) Regularisation tau: parameter, defining the regularisation strength L: matrix of regularisation conditions (dimension nl x nx) x0: bias distribution and chi**2 is minimized as a function of x This applies to a very large number of problems, where the measured distribution y is a linear superposition of several Monte Carlo shapes and the sum of these shapes gives the output distribution x Some random examples: (1) measure a cross-section as a function of, say, E_T(detector) and unfold it to obtain the underlying distribution E_T(generator) (2) measure a lifetime distribution and unfold the contributions from different flavours (3) measure the transverse mass and decay angle and unfold for the true mass distribution plus background References: Talk by V. Blobel, Terascale Statistics school https://indico.desy.de/contributionDisplay.py?contribId=23&confId=1149 References quoted in Blobel's talk: Per Chistian Hansen, Rank-Deficient and Discrete Ill-posed Problems, Siam (1998) Jari Kaipio and Erkki Somersalo, Statistical and Computational Inverse problems, Springer (2005) Implementation The result of the unfolding is calculated as follows: Lsquared = L#L regularisation conditions squared Einv = ((A# V A)+tau Lsquared) is the inverse covariance matrix of x E = inverse(Einv) is the covariance matrix of x x = E A V y + tau Lsquared x0 is the result Warning: The algorithm is based on "standard" matrix inversion, with the known limitations in numerical accuracy and computing cost for matrices with large dimensions. Thus the algorithm should not used for large dimensions of x and y nx should not be much larger than 200 ny should not be much larger than 1000 Example of using TUnfold: imagine a 2-dimensional histogram is filled, named A y-axis: generated quantity (e.g. 10 bins) x-axis: reconstructed quantity (e.g. 20 bin) The data are filled in a 1-dimensional histogram, named y Note1: ALWAYS choose a higher number of bins on the reconstructed side as compared to the generated size! Note2: the events which are generated but not reconstructed have to be added to the appropriate overflow bins of A code fragment (with histograms A and y filled): TUnfold unfold(A,TUnfold::kHistMapOutputHoriz); Double_t tau=1.E-4; Double_t biasScale=0.0; unfold.DoUnfold(tau,y,biasScale); TH1D *x=unfold.GetOutput("x","myVariable"); TH2D *rhoij=unfold.GetRhoIJ("correlation","myVariable"); will create histograms "x" and "correlation" from A and y. if tau is very large, the output is biased to the generated distribution if tau is very small, the output will show oscillations and large entries in the correlation matrix Proper choice of tau One of the most difficult questions is about the choice of tau. The most common method is the L-curve method: a two-dimensional curve is plotted x-axis: log10(chisquare) y-axis: log10(regularisation condition) In many cases this curve has an L-shape. The best choice of tau is in the kink of the L Within TUnfold a simple version of the L-curve analysis is included. It tests a given number of points in a predefined tau-range and searches for the maximum of the curvature in the L-curve (kink position). Example: scan tau and produce the L-curve plot Code fragment: assume A and y are filled TUnfold unfold(A,TUnfold::kHistMapOutputHoriz); unfold.SetInput(y); Int_t nScan=30; Double_t tauMin=1.E-10; Double_t tauMax=1.0; Int_t iBest; TSpline *logTauX,*logTauY; TGraph *lCurve; iBest=unfold.ScanLcurve(nScan,tauMin,tauMax,&lCurve); std::cout<<"tau="<<unfold.GetTau()<<"\n"; TH1D *x=unfold.GetOutput("x","myVariable"); TH2D *rhoij=unfold.GetRhoIJ("correlation","myVariable"); This creates logTauX: the L-curve's x-coordinate as a function of log(tau) logTauY: the L-curve's y-coordinate as a function of log(tau) lCurve: a graph of the L-curve x,rhoij: unfolding result for best choice of tau iBest: the coordinate/spline knot number with the best choice of tau Note: always check the L curve after unfolding. The algorithm is not perfect Bin averaging of the output Sometimes it is useful to unfold for a fine binning in x and calculate the final result with a smaller number of bins. The advantage is a reduction in the correlation coefficients if bins are averaged. For this type of averaging the full error matrix has to be used. There are methods in TUnfold to support this type of calculation Example: The vector x has dimension 49, it consists of 7x7 bins in two variables (Pt,Eta) The unfolding result is to be presented as one-dimensional projections in (Pt) and (Eta) The bins of x are mapped as: bins 1..7 the first Eta bin bins 2..14 the second Eta bin bins 1,8,15,... the first Pt bin code fragment: TUnfold unfold(A,TUnfold::kHistMapOutputHoriz); Double_t tau=1.E-4; Double_t biasScale=0.0; unfold.DoUnfold(tau,y,biasScale); Int_t binMapEta[49+2]; Int_t binMapPt[49+2]; // overflow and underflow bins are not used binMapEta[0]=-1; binMapEta[49+1]=-1; binMapPt[0]=-1; binMapPt[49+1]=-1; for(Int_t i=1;i<=49;i++) { // all bins (i) with the same (i-1)/7 are added binMapEta[i] = (i-1)/7 +1; // all bins (i) with the same (i-1)%7 are added binMapPt[i] = (i-1)%7 +1; } TH1D *etaHist=new TH1D("eta(unfolded)",";eta",7,etamin,etamax); TH1D *etaCorr=new TH2D("eta(unfolded)",";eta;eta",7,etamin,etamax,7,etamin,etamax); TH1D *ptHist=new TH1D("pt(unfolded)",";pt",7,ptmin,ptmax); TH1D *ptCorr=new TH2D("pt(unfolded)",";pt;pt",7,ptmin,ptmax,7,ptmin,ptmax); unfold.GetOutput(etaHist,binMapEta); unfold.GetRhoIJ(etaCorrt,binMapEta); unfold.GetOutput(ptHist,binMapPt); unfold.GetRhoIJ(ptCorrt,binMapPt); Alternative Regularisation conditions Regularisation is needed for most unfolding problems, in order to avoid large oscillations and large correlations on the output bins. It means that some extra conditions are applied on the output bins Within TUnfold these conditions are posed on the difference (x-x0), where x: unfolding output x0: the bias distribution, by default calculated from the input matrix A. There is a method SetBias() to change the bias distribution. The 3rd argument to DoUnfold() is a scale factor applied to the bias bias_default[j] = sum_i A[i][j] x0[j] = scaleBias*bias[j] The scale factor can be used to (a) completely suppress the bias by setting it to zero (b) compensate differences in the normalisation between data and Monte Carlo If the regularisation is strong, i.e. large parameter tau, then the distribution x or its derivatives will look like the bias distribution. If the parameter tau is small, the distribution x is independent of the bias. Three basic types of regularisation are implemented in TUnfold condition regularisation kRegModeNone none kRegModeSize minimize the size of (x-x0) kRegModeDerivative minimize the 1st derivative of (x-x0) kRegModeCurvature minimize the 2nd derivative of (x-x0) kRegModeSize is the regularisation scheme which usually is found in literature. In addition, the bias usually is not present (bias scale factor is zero). The non-standard regularisation schemes kRegModeDerivative and kRegModeCurvature have the nice feature that they create correlations between x-bins, whereas the non-regularized unfolding tends to create negative correlations between bins. For these regularisation schemes the parameter tau could be tuned such that the correlations are smallest, as an alternative to the L-curve method. If kRegModeSize is chosen or if x is a smooth function through all bins, the regularisation condition can be set on all bins together by giving the appropriate argument in the constructor (see examples above). If x is composed of independent groups of bins (for example, signal and background binning in two variables), it may be necessary to set regularisation conditions for the individual groups of bins. In this case, give kRegModeNone in the constructor and specify the bin grouping with calls to RegularizeBins() specify a 1-dimensional group of bins RegularizeBins2D() specify a 2-dimensional group of bins For ultimate flexibility, the regularisation condition can be set on each bin individually -> give kRegModeNone in the constructor and use RegularizeSize() regularize one bin RegularizeDerivative() regularize the slope given by two bins RegularizeCurvature() regularize the curvature given by three bins
TUnfold(const TUnfold&) | |
TUnfold(TH2 *const hist_A, TUnfold::EHistMap histmap, TUnfold::ERegMode regmode = kRegModeSize) | |
virtual | ~TUnfold() |
void | TObject::AbstractMethod(const char* method) const |
virtual void | TObject::AppendPad(Option_t* option = "") |
virtual void | TObject::Browse(TBrowser* b) |
static TClass* | Class() |
virtual const char* | TObject::ClassName() const |
virtual void | TObject::Clear(Option_t* = "") |
virtual TObject* | TObject::Clone(const char* newname = "") const |
virtual Int_t | TObject::Compare(const TObject* obj) const |
virtual void | TObject::Copy(TObject& object) const |
virtual void | TObject::Delete(Option_t* option = "")MENU |
virtual Int_t | TObject::DistancetoPrimitive(Int_t px, Int_t py) |
virtual Double_t | DoUnfold(Double_t const& tau) |
Double_t | DoUnfold(Double_t const& tau, TH1 *const hist_y, Double_t const& scaleBias = 0.0) |
virtual void | TObject::Draw(Option_t* option = "") |
virtual void | TObject::DrawClass() constMENU |
virtual TObject* | TObject::DrawClone(Option_t* option = "") constMENU |
virtual void | TObject::Dump() constMENU |
virtual void | TObject::Error(const char* method, const char* msgfmt) const |
virtual void | TObject::Execute(const char* method, const char* params, Int_t* error = 0) |
virtual void | TObject::Execute(TMethod* method, TObjArray* params, Int_t* error = 0) |
virtual void | TObject::ExecuteEvent(Int_t event, Int_t px, Int_t py) |
virtual void | TObject::Fatal(const char* method, const char* msgfmt) const |
virtual TObject* | TObject::FindObject(const char* name) const |
virtual TObject* | TObject::FindObject(const TObject* obj) const |
TH1D* | GetBias(char *const name, char *const title, Double_t x0 = 0.0, Double_t x1 = 0.0) const |
const Double_t& | GetChi2A() const |
const Double_t& | GetChi2L() const |
virtual Option_t* | TObject::GetDrawOption() const |
static Long_t | TObject::GetDtorOnly() |
void | GetEmatrix(TH2* ematrix, Int_t *const binMap = 0) const |
TH2D* | GetEmatrix(char *const name, char *const title, Double_t x0 = 0.0, Double_t x1 = 0.0) const |
TH1D* | GetFoldedOutput(char *const name, char *const title, Double_t y0 = 0.0, Double_t y1 = 0.0) const |
virtual const char* | TObject::GetIconName() const |
TH1D* | GetInput(char *const name, char *const title, Double_t y0 = 0.0, Double_t y1 = 0.0) const |
Double_t | GetLcurveX() const |
Double_t | GetLcurveY() const |
TH2D* | GetLsquared(char *const name, char *const title, Double_t x0 = 0.0, Double_t x1 = 0.0) const |
virtual const char* | TObject::GetName() const |
virtual char* | TObject::GetObjectInfo(Int_t px, Int_t py) const |
static Bool_t | TObject::GetObjectStat() |
virtual Option_t* | TObject::GetOption() const |
void | GetOutput(TH1* output, Int_t *const binMap = 0) const |
TH1D* | GetOutput(char *const name, char *const title, Double_t x0 = 0.0, Double_t x1 = 0.0) const |
const Double_t& | GetRhoAvg() const |
Double_t | GetRhoI(TH1* rhoi, TH2* ematrixinv = 0, Int_t *const binMap = 0) const |
TH1D* | GetRhoI(char *const name, char *const title, Double_t x0 = 0.0, Double_t x1 = 0.0) const |
void | GetRhoIJ(TH2* rhoij, Int_t *const binMap = 0) const |
TH2D* | GetRhoIJ(char *const name, char *const title, Double_t x0 = 0.0, Double_t x1 = 0.0) const |
const Double_t& | GetRhoMax() const |
const Double_t& | GetTau() const |
virtual const char* | TObject::GetTitle() const |
virtual UInt_t | TObject::GetUniqueID() const |
virtual Bool_t | TObject::HandleTimer(TTimer* timer) |
virtual ULong_t | TObject::Hash() const |
virtual void | TObject::Info(const char* method, const char* msgfmt) const |
virtual Bool_t | TObject::InheritsFrom(const char* classname) const |
virtual Bool_t | TObject::InheritsFrom(const TClass* cl) const |
virtual void | TObject::Inspect() constMENU |
void | TObject::InvertBit(UInt_t f) |
virtual TClass* | IsA() const |
virtual Bool_t | TObject::IsEqual(const TObject* obj) const |
virtual Bool_t | TObject::IsFolder() const |
Bool_t | TObject::IsOnHeap() const |
virtual Bool_t | TObject::IsSortable() const |
Bool_t | TObject::IsZombie() const |
virtual void | TObject::ls(Option_t* option = "") const |
void | TObject::MayNotUse(const char* method) const |
virtual Bool_t | TObject::Notify() |
static void | TObject::operator delete(void* ptr) |
static void | TObject::operator delete(void* ptr, void* vp) |
static void | TObject::operator delete[](void* ptr) |
static void | TObject::operator delete[](void* ptr, void* vp) |
void* | TObject::operator new(size_t sz) |
void* | TObject::operator new(size_t sz, void* vp) |
void* | TObject::operator new[](size_t sz) |
void* | TObject::operator new[](size_t sz, void* vp) |
TUnfold& | operator=(const TUnfold&) |
virtual void | TObject::Paint(Option_t* option = "") |
virtual void | TObject::Pop() |
virtual void | TObject::Print(Option_t* option = "") const |
virtual Int_t | TObject::Read(const char* name) |
virtual void | TObject::RecursiveRemove(TObject* obj) |
void | RegularizeBins(int start, int step, int nbin, TUnfold::ERegMode regmode) |
void | RegularizeBins2D(int start_bin, int step1, int nbin1, int step2, int nbin2, TUnfold::ERegMode regmode) |
void | RegularizeCurvature(int left_bin, int center_bin, int right_bin, Double_t const& scale_left = 1.0, Double_t const& scale_right = 1.0) |
void | RegularizeDerivative(int left_bin, int right_bin, Double_t const& scale = 1.0) |
void | RegularizeSize(int bin, Double_t const& scale = 1.0) |
void | TObject::ResetBit(UInt_t f) |
virtual void | TObject::SaveAs(const char* filename = "", Option_t* option = "") constMENU |
virtual void | TObject::SavePrimitive(basic_ostream<char,char_traits<char> >& out, Option_t* option = "") |
virtual Int_t | ScanLcurve(Int_t nPoint, Double_t const& tauMin, Double_t const& tauMax, TGraph** lCurve, TSpline** logTauX = 0, TSpline** logTauY = 0) |
void | SetBias(TH1 *const bias) |
void | TObject::SetBit(UInt_t f) |
void | TObject::SetBit(UInt_t f, Bool_t set) |
virtual void | TObject::SetDrawOption(Option_t* option = "")MENU |
static void | TObject::SetDtorOnly(void* obj) |
void | SetInput(TH1 *const hist_y, Double_t const& scaleBias = 0.0) |
static void | TObject::SetObjectStat(Bool_t stat) |
virtual void | TObject::SetUniqueID(UInt_t uid) |
virtual void | ShowMembers(TMemberInspector& insp, char* parent) |
virtual void | Streamer(TBuffer& b) |
void | StreamerNVirtual(TBuffer& b) |
virtual void | TObject::SysError(const char* method, const char* msgfmt) const |
Bool_t | TObject::TestBit(UInt_t f) const |
Int_t | TObject::TestBits(UInt_t f) const |
virtual void | TObject::UseCurrentStyle() |
virtual void | TObject::Warning(const char* method, const char* msgfmt) const |
virtual Int_t | TObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0) |
virtual Int_t | TObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0) const |
TUnfold() | |
virtual void | CalculateChi2Rho() |
static TMatrixDSparse* | CreateSparseMatrix(Int_t nr, Int_t nc, Int_t* row, Int_t* col, Double_t *const data) |
virtual void | TObject::DoError(int level, const char* location, const char* fmt, va_list va) const |
virtual Double_t | DoUnfold() |
Int_t | GetNx() const |
Int_t | GetNy() const |
void | TObject::MakeZombie() |
void | ClearTUnfold() |
enum EHistMap { | kHistMapOutputHoriz | |
kHistMapOutputVert | ||
}; | ||
enum ERegMode { | kRegModeNone | |
kRegModeSize | ||
kRegModeDerivative | ||
kRegModeCurvature | ||
}; | ||
enum TObject::EStatusBits { | kCanDelete | |
kMustCleanup | ||
kObjInCanvas | ||
kIsReferenced | ||
kHasUUID | ||
kCannotPick | ||
kNoContextMenu | ||
kInvalidObject | ||
}; | ||
enum TObject::[unnamed] { | kIsOnHeap | |
kNotDeleted | ||
kZombie | ||
kBitMask | ||
kSingleKey | ||
kOverwrite | ||
kWriteDelete | ||
}; |
TMatrixDSparse* | fA | Input: matrix |
TMatrixDSparse* | fAx | Result: Ax |
Double_t | fBiasScale | Input: scale factor for the bias |
Double_t | fChi2A | Result: chi**2 contribution from (y-Ax)V(y-Ax) |
Double_t | fChi2L | Result: chi**2 contribution from tau(x-s*x0)Lsquared(x-s*x0) |
TMatrixD* | fE | Result: error matrix |
TMatrixDSparse* | fEinv | Result: inverse error matrix |
TArrayI | fHistToX | Input: histogram bins -> matrix indices |
TMatrixDSparse* | fLsquared | Input: regularisation conditions squared |
Double_t | fRhoAvg | Result: average global correlation |
Double_t | fRhoMax | Result: maximum global correlation |
Double_t | fTau | Input: regularisation parameter |
TMatrixDSparse* | fV | Input: covariance matrix for y |
TMatrixD* | fX | Result: x |
TMatrixD* | fX0 | Input: x0 |
TArrayI | fXToHist | Input: matrix indices -> histogram bins |
TMatrixD* | fY | Input: y |
main unfolding algorithm. Declared virtual, because other algorithms could be implemented Purpose: unfold y -> x Data members required: fA: matrix to relate x and y fY: measured data points fX0: bias on x fBiasScale: scale factor for fX0 fV: inverse of covariance matrix for y fLsquared: regularisation conditions fTau: regularisation strength Data members modified: fEinv: inverse of the covariance matrix of x fE: covariance matrix of x fX: unfolded data points fAx: estimate of y from x fChi2A: contribution to chi**2 from y-fAx fChi2L: contribution to chi**2 from L*(x-x0) fRhoMax: maximum global correlation coefficient fRhoAvg: average global correlation coefficient return code: fRhoMax if(fRhoMax>=1.0) then the unfolding has failed!
Create a sparse matrix, to set up fA or fV. nr,nc: number of rows and columns row: row index array col: column index array data: non-zero matrix elements Return value: a new matrix This implementation cirumvents certain problems with TMatrixDSparse constructors. Eventually calls to this method should be replaced by something like new TMatrixDSparse( some_arguments )
set up unfolding matrix and initial regularisation scheme hist_A: matrix that describes the migrations histmap: mapping of the histogram axes to unfolding output regmode: global regularisation mode data members initialized to something different from zero: fA: filled from hist_A fX0: filled from hist_A fLsquared: filled depending on the regularisation scheme Treatment of overflow bins Bins where the unfolding input (Detector level) is in overflow are used for the efficiency correction. They have to be filled properly! Bins where the unfolding output (Generator level) is in overflow are treated as a part of the generator level distribution. I.e. the unfolding output could have non-zero overflow bins if the input matrix does have such bins.
add regularisation on the size of bin i bin: bin number scale: size of the regularisation modifies data member fLsquared
add regularisation on the difference of two bins left_bin: 1st bin right_bin: 2nd bin scale: size of the regularisation modifies data member fLsquared
add regularisation on the curvature through 3 bins (2nd derivative) left_bin: 1st bin center_bin: 2nd bin right_bin: 3rd bin scale_left: scale factor on center-left difference scale_right: scale factor on right-center difference modifies data member fLsquared
set regulatisation on a 1-dimensional curve start: first bin step: distance between neighbouring bins nbin: total number of bins regmode: regularisation mode modifies data member fLsquared
set regularisation on a 2-dimensional grid of bins start: first bin step1: distance between bins in 1st direction nbin1: number of bins in 1st direction step2: distance between bins in 2nd direction nbin2: number of bins in 2nd direction modifies data member fLsquared
Do unfolding of an input histogram tau_reg: regularisation parameter input: input distribution with errors scaleBias: scale factor applied to the bias Data members required: fA, fX0, fLsquared Data members modified: those documented in SetInput() and those documented in DoUnfold(Double_t) Return value: maximum global correlation coefficient NOTE!!! return value >=1.0 means error, and the result is junk Overflow bins of the input distribution are ignored!
Define the input data for subsequent calls to DoUnfold(Double_t) input: input distribution with errors scaleBias: scale factor applied to the bias Data members modified: fY, fV, fBiasScale
Unfold with given value of regularisation parameter tau tau: new tau parameter required data members: fA: matrix to relate x and y fY: measured data points fX0: bias on x fBiasScale: scale factor for fX0 fV: inverse of covariance matrix for y fLsquared: regularisation conditions modified data members: fTau and those documented in DoUnfold(void)
scan the L curve
nPoint: number of points to scan
tauMin: smallest tau value to study
tauMax: largest tau value to study
lCurve: the L curve as graph
logTauX: output spline of x-coordinates vs tau for the L curve
logTauY: output spline of y-coordinates vs tau for the L curve
return value: the coordinate number (0..nPoint-1) with the "best" choice
of tau
retreive unfolding result as histogram name: name of the histogram title: title of the histogram x0,x1: lower/upper edge of histogram. if (x0>=x1) then x0=0 and x1=nbin are used
retreive bias as histogram name: name of the histogram title: title of the histogram x0,x1: lower/upper edge of histogram. if (x0>=x1) then x0=0 and x1=nbin are used
retreive unfolding result folded back by the matrix name: name of the histogram title: title of the histogram y0,y1: lower/upper edge of histogram. if (y0>=y1) then y0=0 and y1=nbin are used
retreive input distribution name: name of the histogram title: title of the histogram y0,y1: lower/upper edge of histogram. if (y0>=y1) then y0=0 and y1=nbin are used
retreive full matrix of correlation coefficients name: name of the histogram title: title of the histogram x0,x1: lower/upper edge of histogram. if (x0>=x1) then x0=0 and x1=nbin are used
retreive full error matrix name: name of the histogram title: title of the histogram x0,x1: lower/upper edge of histogram. if (x0>=x1) then x0=0 and x1=nbin are used
retreive matrix of global correlation coefficients name: name of the histogram title: title of the histogram x0,x1: lower/upper edge of histogram. if (x0>=x1) then x0=0 and x1=nbin are used
retreive ix of regularisation conditions squared name: name of the histogram title: title of the histogram x0,x1: lower/upper edge of histogram. if (x0>=x1) then x0=0 and x1=nbin are used
return maximum global correlation Note: return value>1.0 means the unfolding has failed
get output distribution, cumulated over several bins output: output histogram binMap: for each bin of the original output distribution specify the destination bin. A value of -1 means that the bin is discarded. 0 means underflow bin, 1 first bin, ... binMap[0] : destination of underflow bin binMap[1] : destination of first bin
get output error matrix, cumulated over several bins ematrix: error matrix histogram binMap: for each bin of the original output distribution specify the destination bin. A value of -1 means that the bin is discarded. 0 means underflow bin, 1 first bin, ... binMap[0] : destination of underflow bin binMap[1] : destination of first bin
get global correlation coefficients and inverted error matrix, cumulated over several bins rhoi: global correlation histogram ematrixinv: inverse of error matrix (if pointer==0 it is not returned) binMap: for each bin of the original output distribution specify the destination bin. A value of -1 means that the bin is discarded. 0 means underflow bin, 1 first bin, ... binMap[0] : destination of underflow bin binMap[1] : destination of first bin return value: average global correlation
get correlation coefficient matrix, cumulated over several bins rhoij: correlation coefficient matrix histogram binMap: for each bin of the original output distribution specify the destination bin. A value of -1 means that the bin is discarded. 0 means underflow bin, 1 first bin, ... binMap[0] : destination of underflow bin binMap[1] : destination of first bin