TMultiLayerPerceptron This class describes a neural network. There are facilities to train the network and use the output. The input layer is made of inactive neurons (returning the optionaly normalized input) and output neurons are linear. The type of hidden neurons is free, the default being sigmoids. (One should still try to pass normalized inputs, e.g. between [0.,1]) The basic input is a TTree and two (training and test) TEventLists. Input and output neurons are assigned a value computed for each event with the same possibilities as for TTree::Draw(). Events may be weighted individualy or via TTree::SetWeight(). 6 learning methods are available: kStochastic, kBatch, kSteepestDescent, kRibierePolak, kFletcherReeves and kBFGS. This implementation, written by C. Delaere, is *inspired* from the mlpfit package from J.Schwindling et al. with some extensions: * the algorithms are globally the same * in TMultilayerPerceptron, there is no limitation on the number of layers/neurons, while MLPFIT was limited to 2 hidden layers * TMultilayerPerceptron allows you to save the network in a root file, and provides more export functionnalities * TMultilayerPerceptron gives more flexibility regarding the normalization of inputs/outputs * TMultilayerPerceptron provides, thanks to Andrea Bocci, the possibility to use cross-entropy errors, which allows to train a network for pattern classification based on Bayesian posterior probability.
Neural Networks are more and more used in various fields for data analysis and classification, both for research and commercial institutions. Some randomly choosen examples are:
image analysis
financial movements predictions and analysis
sales forecast and product shipping optimisation
in particles physics: mainly for classification tasks (signal over background discrimination)
More than 50% of neural networks are multilayer perceptrons. This implementation of multilayer perceptrons is inspired from the MLPfit package originaly written by Jerome Schwindling. MLPfit remains one of the fastest tool for neural networks studies, and this ROOT add-on will not try to compete on that. A clear and flexible Object Oriented implementation has been choosen over a faster but more difficult to maintain code. Nevertheless, the time penalty does not exceed a factor 2.
The multilayer perceptron is a simple feed-forward network with the following structure:
It is made of neurons characterized by a bias and weighted links between them (let's call those links synapses). The input neurons receive the inputs, normalize them and forward them to the first hidden layer.
Each neuron in any subsequent layer first computes a linear combination of the outputs of the previous layer. The output of the neuron is then function of that combination with f being linear for output neurons or a sigmoid for hidden layers. This is useful because of two theorems:
A linear combination of sigmoids can approximate any continuous function.
Trained with output = 1 for the signal and 0 for the background, the approximated function of inputs X is the probability of signal, knowing X.
The aim of all learning methods is to minimize the total error on a set of weighted examples. The error is defined as the sum in quadrature, devided by two, of the error on each individual output neuron.
In all methods implemented, one needs to compute the first derivative of that error with respect to the weights. Exploiting the well-known properties of the derivative, especialy the derivative of compound functions, one can write:
for a neuton: product of the local derivative with the weighted sum on the outputs of the derivatives.
for a synapse: product of the input with the local derivative of the output neuron.
This computation is called back-propagation of the errors. A loop over all examples is called an epoch.
Six learning methods are implemented.
Stochastic minimization: This is the most trivial learning method. This is the Robbins-Monro stochastic approximation applied to multilayer perceptrons. The weights are updated after each example according to the formula:
$w_{ij}(t+1) = w_{ij}(t) + \Delta w_{ij}(t)$
with
$\Delta w_{ij}(t) = - \eta(\d e_p / \d w_{ij} + \delta) + \epsilon \Deltaw_{ij}(t-1)$
The parameters for this method are Eta, EtaDecay, Delta and Epsilon.
Steepest descent with fixed step size (batch learning): It is the same as the stochastic minimization, but the weights are updated after considering all the examples, with the total derivative dEdw. The parameters for this method are Eta, EtaDecay, Delta and Epsilon.
Steepest descent algorithm: Weights are set to the minimum along the line defined by the gradient. The only parameter for this method is Tau. Lower tau = higher precision = slower search. A value Tau = 3 seems reasonable.
Conjugate gradients with the Polak-Ribiere updating formula: Weights are set to the minimum along the line defined by the conjugate gradient. Parameters are Tau and Reset, which defines the epochs where the direction is reset to the steepes descent.
Conjugate gradients with the Fletcher-Reeves updating formula: Weights are set to the minimum along the line defined by the conjugate gradient. Parameters are Tau and Reset, which defines the epochs where the direction is reset to the steepes descent.
Broyden, Fletcher, Goldfarb, Shanno (BFGS) method: Implies the computation of a NxN matrix computation, but seems more powerful at least for less than 300 weights. Parameters are Tau and Reset, which defines the epochs where the direction is reset to the steepes descent.
TMLP is build from 3 classes: TNeuron, TSynapse and TMultiLayerPerceptron. Only TMultiLayerPerceptron should be used explicitely by the user.
TMultiLayerPerceptron will take examples from a TTree given in the constructor. The network is described by a simple string: The input/output layers are defined by giving the expression for each neuron, separated by comas. Hidden layers are just described by the number of neurons. The layers are separated by colons. In addition, input/output layer formulas can be preceded by '@' (e.g "@out") if one wants to also normalize the data from the TTree. Input and outputs are taken from the TTree given as second argument. Expressions are evaluated as for TTree::Draw(), arrays are expended in distinct neurons, one for each index. This can only be done for fixed-size arrays. If the formula ends with "!", softmax functions are used for the output layer. One defines the training and test datasets by TEventLists.
Example: TMultiLayerPerceptron("x,y:10:5:f",inputTree);
Both the TTree and the TEventLists can be defined in the constructor, or later with the suited setter method. The lists used for training and test can be defined either explicitely, or via a string containing the formula to be used to define them, exactly as for a TCut.
The learning method is defined using the TMultiLayerPerceptron::SetLearningMethod() . Learning methods are :
TMultiLayerPerceptron::kStochastic,
TMultiLayerPerceptron::kBatch,
TMultiLayerPerceptron::kSteepestDescent,
TMultiLayerPerceptron::kRibierePolak,
TMultiLayerPerceptron::kFletcherReeves,
TMultiLayerPerceptron::kBFGS
A weight can be assigned to events, either in the constructor, either with TMultiLayerPerceptron::SetEventWeight(). In addition, the TTree weight is taken into account.
Finally, one starts the training with TMultiLayerPerceptron::Train(Int_t nepoch, Option_t* options). The first argument is the number of epochs while option is a string that can contain: "text" (simple text output) , "graph" (evoluting graphical training curves), "update=X" (step for the text/graph output update) or "+" (will skip the randomisation and start from the previous values). All combinations are available.
Example: net.Train(100,"text, graph, update=10").
When the neural net is trained, it can be used directly ( TMultiLayerPerceptron::Evaluate() ) or exported to a standalone C++ code ( TMultiLayerPerceptron::Export() ).
Finaly, note that even if this implementation is inspired from the mlpfit code,
the feature lists are not exactly matching:
mlpfit hybrid learning method is not implemented output neurons can be normalized, this is not the case for mlpfit the neural net is exported in C++, FORTRAN or PYTHON the drawResult() method allows a fast check of the learning procedure
In addition, the paw version of mlpfit had additional limitations on the number of neurons, hidden layers and inputs/outputs that does not apply to TMultiLayerPerceptron.
TMultiLayerPerceptron() | |
TMultiLayerPerceptron(const char* layout, TTree* data = 0, const char* training = "Entry$%2==0", const char* test = "", TNeuron::ENeuronType type = TNeuron::kSigmoid, const char* extF = "", const char* extD = "") | |
TMultiLayerPerceptron(const char* layout, TTree* data, TEventList* training, TEventList* test, TNeuron::ENeuronType type = TNeuron::kSigmoid, const char* extF = "", const char* extD = "") | |
TMultiLayerPerceptron(const char* layout, const char* weight, TTree* data = 0, const char* training = "Entry$%2==0", const char* test = "", TNeuron::ENeuronType type = TNeuron::kSigmoid, const char* extF = "", const char* extD = "") | |
TMultiLayerPerceptron(const char* layout, const char* weight, TTree* data, TEventList* training, TEventList* test, TNeuron::ENeuronType type = TNeuron::kSigmoid, const char* extF = "", const char* extD = "") | |
virtual | ~TMultiLayerPerceptron() |
void | TObject::AbstractMethod(const char* method) const |
virtual void | TObject::AppendPad(Option_t* option = "") |
virtual void | TObject::Browse(TBrowser* b) |
static TClass* | Class() |
virtual const char* | TObject::ClassName() const |
virtual void | TObject::Clear(Option_t* = "") |
virtual TObject* | TObject::Clone(const char* newname = "") const |
virtual Int_t | TObject::Compare(const TObject* obj) const |
void | ComputeDEDw() const |
virtual void | TObject::Copy(TObject& object) const |
virtual void | TObject::Delete(Option_t* option = "")MENU |
virtual Int_t | TObject::DistancetoPrimitive(Int_t px, Int_t py) |
virtual void | Draw(Option_t* option = "") |
virtual void | TObject::DrawClass() constMENU |
virtual TObject* | TObject::DrawClone(Option_t* option = "") constMENU |
void | DrawResult(Int_t index = 0, Option_t* option = "test") const |
virtual void | TObject::Dump() constMENU |
void | DumpWeights(Option_t* filename = "-") const |
virtual void | TObject::Error(const char* method, const char* msgfmt) const |
Double_t | Evaluate(Int_t index, Double_t* params) const |
virtual void | TObject::Execute(const char* method, const char* params, Int_t* error = 0) |
virtual void | TObject::Execute(TMethod* method, TObjArray* params, Int_t* error = 0) |
virtual void | TObject::ExecuteEvent(Int_t event, Int_t px, Int_t py) |
void | Export(Option_t* filename = "NNfunction", Option_t* language = "C++") const |
virtual void | TObject::Fatal(const char* method, const char* msgfmt) const |
virtual TObject* | TObject::FindObject(const char* name) const |
virtual TObject* | TObject::FindObject(const TObject* obj) const |
Double_t | GetDelta() const |
virtual Option_t* | TObject::GetDrawOption() const |
static Long_t | TObject::GetDtorOnly() |
Double_t | GetEpsilon() const |
Double_t | GetError(Int_t event) const |
Double_t | GetError(TMultiLayerPerceptron::EDataSet set) const |
Double_t | GetEta() const |
Double_t | GetEtaDecay() const |
virtual const char* | TObject::GetIconName() const |
virtual const char* | TObject::GetName() const |
virtual char* | TObject::GetObjectInfo(Int_t px, Int_t py) const |
static Bool_t | TObject::GetObjectStat() |
virtual Option_t* | TObject::GetOption() const |
Int_t | GetReset() const |
TString | GetStructure() const |
Double_t | GetTau() const |
virtual const char* | TObject::GetTitle() const |
TNeuron::ENeuronType | GetType() const |
virtual UInt_t | TObject::GetUniqueID() const |
virtual Bool_t | TObject::HandleTimer(TTimer* timer) |
virtual ULong_t | TObject::Hash() const |
virtual void | TObject::Info(const char* method, const char* msgfmt) const |
virtual Bool_t | TObject::InheritsFrom(const char* classname) const |
virtual Bool_t | TObject::InheritsFrom(const TClass* cl) const |
virtual void | TObject::Inspect() constMENU |
void | TObject::InvertBit(UInt_t f) |
virtual TClass* | IsA() const |
virtual Bool_t | TObject::IsEqual(const TObject* obj) const |
virtual Bool_t | TObject::IsFolder() const |
Bool_t | TObject::IsOnHeap() const |
virtual Bool_t | TObject::IsSortable() const |
Bool_t | TObject::IsZombie() const |
void | LoadWeights(Option_t* filename = "") |
virtual void | TObject::ls(Option_t* option = "") const |
void | TObject::MayNotUse(const char* method) const |
virtual Bool_t | TObject::Notify() |
static void | TObject::operator delete(void* ptr) |
static void | TObject::operator delete(void* ptr, void* vp) |
static void | TObject::operator delete[](void* ptr) |
static void | TObject::operator delete[](void* ptr, void* vp) |
void* | TObject::operator new(size_t sz) |
void* | TObject::operator new(size_t sz, void* vp) |
void* | TObject::operator new[](size_t sz) |
void* | TObject::operator new[](size_t sz, void* vp) |
virtual void | TObject::Paint(Option_t* option = "") |
virtual void | TObject::Pop() |
virtual void | TObject::Print(Option_t* option = "") const |
void | Randomize() const |
virtual Int_t | TObject::Read(const char* name) |
virtual void | TObject::RecursiveRemove(TObject* obj) |
void | TObject::ResetBit(UInt_t f) |
Double_t | Result(Int_t event, Int_t index = 0) const |
virtual void | TObject::SaveAs(const char* filename = "", Option_t* option = "") constMENU |
virtual void | TObject::SavePrimitive(basic_ostream<char,char_traits<char> >& out, Option_t* option = "") |
void | TObject::SetBit(UInt_t f) |
void | TObject::SetBit(UInt_t f, Bool_t set) |
void | SetData(TTree*) |
void | SetDelta(Double_t delta) |
virtual void | TObject::SetDrawOption(Option_t* option = "")MENU |
static void | TObject::SetDtorOnly(void* obj) |
void | SetEpsilon(Double_t eps) |
void | SetEta(Double_t eta) |
void | SetEtaDecay(Double_t ed) |
void | SetEventWeight(const char*) |
void | SetLearningMethod(TMultiLayerPerceptron::ELearningMethod method) |
static void | TObject::SetObjectStat(Bool_t stat) |
void | SetReset(Int_t reset) |
void | SetTau(Double_t tau) |
void | SetTestDataSet(TEventList* test) |
void | SetTestDataSet(const char* test) |
void | SetTrainingDataSet(TEventList* train) |
void | SetTrainingDataSet(const char* train) |
virtual void | TObject::SetUniqueID(UInt_t uid) |
virtual void | ShowMembers(TMemberInspector& insp, char* parent) |
virtual void | Streamer(TBuffer& b) |
void | StreamerNVirtual(TBuffer& b) |
virtual void | TObject::SysError(const char* method, const char* msgfmt) const |
Bool_t | TObject::TestBit(UInt_t f) const |
Int_t | TObject::TestBits(UInt_t f) const |
void | Train(Int_t nEpoch, Option_t* option = "text") |
virtual void | TObject::UseCurrentStyle() |
virtual void | TObject::Warning(const char* method, const char* msgfmt) const |
virtual Int_t | TObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0) |
virtual Int_t | TObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0) const |
void | AttachData() |
void | BFGSDir(TMatrixD&, Double_t*) |
void | BuildNetwork() |
void | ConjugateGradientsDir(Double_t*, Double_t) |
Double_t | DerivDir(Double_t*) |
virtual void | TObject::DoError(int level, const char* location, const char* fmt, va_list va) const |
bool | GetBFGSH(TMatrixD&, TMatrixD&, TMatrixD&) |
Double_t | GetCrossEntropy() const |
Double_t | GetCrossEntropyBinary() const |
void | GetEntry(Int_t) const |
Double_t | GetSumSquareError() const |
Bool_t | LineSearch(Double_t*, Double_t*) |
void | TObject::MakeZombie() |
void | MLP_Batch(Double_t*) |
void | MLP_Stochastic(Double_t*) |
void | SetGammaDelta(TMatrixD&, TMatrixD&, Double_t*) |
void | SteepestDir(Double_t*) |
TMultiLayerPerceptron(const TMultiLayerPerceptron&) | |
void | BuildFirstLayer(TString&) |
void | BuildHiddenLayers(TString&) |
void | BuildLastLayer(TString&, Int_t) |
void | ExpandStructure() |
void | MLP_Line(Double_t*, Double_t*, Double_t) |
TMultiLayerPerceptron& | operator=(const TMultiLayerPerceptron&) |
void | Shuffle(Int_t*, Int_t) const |
enum ELearningMethod { | kStochastic | |
kBatch | ||
kSteepestDescent | ||
kRibierePolak | ||
kFletcherReeves | ||
kBFGS | ||
}; | ||
enum EDataSet { | kTraining | |
kTest | ||
}; | ||
enum TObject::EStatusBits { | kCanDelete | |
kMustCleanup | ||
kObjInCanvas | ||
kIsReferenced | ||
kHasUUID | ||
kCannotPick | ||
kNoContextMenu | ||
kInvalidObject | ||
}; | ||
enum TObject::[unnamed] { | kIsOnHeap | |
kNotDeleted | ||
kZombie | ||
kBitMask | ||
kSingleKey | ||
kOverwrite | ||
kWriteDelete | ||
}; |
Int_t | fCurrentTree | ! index of the current tree in a chain |
Double_t | fCurrentTreeWeight | ! weight of the current tree in a chain |
TTree* | fData | ! pointer to the tree used as datasource |
Double_t | fDelta | ! Delta - used in stochastic minimisation - Default=0. |
Double_t | fEpsilon | ! Epsilon - used in stochastic minimisation - Default=0. |
Double_t | fEta | ! Eta - used in stochastic minimisation - Default=0.1 |
Double_t | fEtaDecay | ! EtaDecay - Eta *= EtaDecay at each epoch - Default=1. |
TTreeFormula* | fEventWeight | ! formula representing the event weight |
TObjArray | fFirstLayer | Collection of the input neurons; subset of fNetwork |
Double_t | fLastAlpha | ! internal parameter used in line search |
TObjArray | fLastLayer | Collection of the output neurons; subset of fNetwork |
TMultiLayerPerceptron::ELearningMethod | fLearningMethod | ! The Learning Method |
TTreeFormulaManager* | fManager | ! TTreeFormulaManager for the weight and neurons |
TObjArray | fNetwork | Collection of all the neurons in the network |
TNeuron::ENeuronType | fOutType | Type of output neurons |
Int_t | fReset | ! number of epochs between two resets of the search direction to the steepest descent - Default=50 |
TString | fStructure | String containing the network structure |
TObjArray | fSynapses | Collection of all the synapses in the network |
Double_t | fTau | ! Tau - used in line search - Default=3. |
TEventList* | fTest | ! EventList defining the events in the test dataset |
Bool_t | fTestOwner | ! internal flag whether one has to delete fTest or not |
TEventList* | fTraining | ! EventList defining the events in the training dataset |
Bool_t | fTrainingOwner | ! internal flag whether one has to delete fTraining or not |
TNeuron::ENeuronType | fType | Type of hidden neurons |
TString | fWeight | String containing the event weight |
TString | fextD | String containing the derivative name |
TString | fextF | String containing the function name |
The network is described by a simple string: The input/output layers are defined by giving the branch names separated by comas. Hidden layers are just described by the number of neurons. The layers are separated by colons. Ex: "x,y:10:5:f" The output can be prepended by '@' if the variable has to be normalized. The output can be followed by '!' to use Softmax neurons for the output layer only. Ex: "x,y:10:5:c1,c2,c3!" Input and outputs are taken from the TTree given as second argument. training and test are the two TEventLists defining events to be used during the neural net training. Both the TTree and the TEventLists can be defined in the constructor, or later with the suited setter method.
The network is described by a simple string: The input/output layers are defined by giving the branch names separated by comas. Hidden layers are just described by the number of neurons. The layers are separated by colons. Ex: "x,y:10:5:f" The output can be prepended by '@' if the variable has to be normalized. The output can be followed by '!' to use Softmax neurons for the output layer only. Ex: "x,y:10:5:c1,c2,c3!" Input and outputs are taken from the TTree given as second argument. training and test are the two TEventLists defining events to be used during the neural net training. Both the TTree and the TEventLists can be defined in the constructor, or later with the suited setter method.
The network is described by a simple string: The input/output layers are defined by giving the branch names separated by comas. Hidden layers are just described by the number of neurons. The layers are separated by colons. Ex: "x,y:10:5:f" The output can be prepended by '@' if the variable has to be normalized. The output can be followed by '!' to use Softmax neurons for the output layer only. Ex: "x,y:10:5:c1,c2,c3!" Input and outputs are taken from the TTree given as second argument. training and test are two cuts (see TTreeFormula) defining events to be used during the neural net training and testing. Example: "Entry$%2", "(Entry$+1)%2". Both the TTree and the cut can be defined in the constructor, or later with the suited setter method.
The network is described by a simple string: The input/output layers are defined by giving the branch names separated by comas. Hidden layers are just described by the number of neurons. The layers are separated by colons. Ex: "x,y:10:5:f" The output can be prepended by '@' if the variable has to be normalized. The output can be followed by '!' to use Softmax neurons for the output layer only. Ex: "x,y:10:5:c1,c2,c3!" Input and outputs are taken from the TTree given as second argument. training and test are two cuts (see TTreeFormula) defining events to be used during the neural net training and testing. Example: "Entry$%2", "(Entry$+1)%2". Both the TTree and the cut can be defined in the constructor, or later with the suited setter method.
Sets the Training dataset. Those events will be used for the minimization
Sets the Test dataset. Those events will not be used for the minimization but for control
Sets the Training dataset. Those events will be used for the minimization. Note that the tree must be already defined.
Sets the Test dataset. Those events will not be used for the minimization but for control. Note that the tree must be already defined.
Sets the learning method. Available methods are: kStochastic, kBatch, kSteepestDescent, kRibierePolak, kFletcherReeves and kBFGS. (look at the constructor for the complete description of learning methods and parameters)
Sets Eta - used in stochastic minimisation (look at the constructor for the complete description of learning methods and parameters)
Sets Epsilon - used in stochastic minimisation (look at the constructor for the complete description of learning methods and parameters)
Sets Delta - used in stochastic minimisation (look at the constructor for the complete description of learning methods and parameters)
Sets EtaDecay - Eta *= EtaDecay at each epoch (look at the constructor for the complete description of learning methods and parameters)
Sets Tau - used in line search (look at the constructor for the complete description of learning methods and parameters)
Sets number of epochs between two resets of the search direction to the steepest descent. (look at the constructor for the complete description of learning methods and parameters)
Train the network. nEpoch is the number of iterations. option can contain: - "text" (simple text output) - "graph" (evoluting graphical training curves) - "update=X" (step for the text/graph output update) - "+" will skip the randomisation and start from the previous values. - "current" (draw in the current canvas) All combinations are available.
Computes the output for a given event. Look at the output neuron designed by index.
Compute the DEDw = sum on all training events of dedw for each weight normalized by the number of events.
Connects the TTree to Neurons in input and output layers. The formulas associated to each neuron are created and reported to the network formula manager. By default, the branch is not normalised since this would degrade performance for classification jobs. Normalisation can be requested by putting '@' in front of the formula.
Instanciates the neurons in input Inputs are normalised and the type is set to kOff (simple forward of the formula value)
Builds the output layer
Neurons are linear combinations of input, by defaul.
If the structure ends with "!", neurons are set up for classification,
ie. with a sigmoid (1 neuron) or softmax (more neurons) activation function.
Draws the neural net output It produces an histogram with the output for the two datasets. Index is the number of the desired output neuron. "option" can contain: - test or train to select a dataset - comp to produce a X-Y comparison plot - nocanv to not create a new TCanvas for the plot
Dumps the weights to a text file.
Set filename to "-" (default) to dump to the standard output
Loads the weights from a text file conforming to the format defined by DumpWeights.
Returns the Neural Net for a given set of input parameters #parameters must equal #input neurons
One step for the stochastic method buffer should contain the previous dw vector and will be updated
One step for the batch (stochastic) method. DEDw should have been updated before calling this.
Sets the weights to a point along a line Weights are set to [origin + (dist * dir)].
Search along the line defined by direction. buffer is not used but is updated with the new dw so that it can be used by a later stochastic step. It returns true if the line search fails.
Sets the search direction to conjugate gradient direction beta should be: ||g_{(t+1)}||^2 / ||g_{(t)}||^2 (Fletcher-Reeves) g_{(t+1)} (g_{(t+1)}-g_{(t)}) / ||g_{(t)}||^2 (Ribiere-Polak)
Computes the hessian matrix using the BFGS update algorithm. from gamma (g_{(t+1)}-g_{(t)}) and delta (w_{(t+1)}-w_{(t)}). It returns true if such a direction could not be found (if gamma and delta are orthogonal).
Sets the gamma (g_{(t+1)}-g_{(t)}) and delta (w_{(t+1)}-w_{(t)}) vectors Gamma is computed here, so ComputeDEDw cannot have been called before, and delta is a direct translation of buffer into a TMatrixD.
scalar product between gradient and direction = derivative along direction
Computes the direction for the BFGS algorithm as the product between the Hessian estimate (bfgsh) and the dir.
Draws the network structure. Neurons are depicted by a blue disk, and synapses by lines connecting neurons. The line width is proportionnal to the weight.