Caffe
Public Member Functions | Protected Member Functions | List of all members
caffe::LSTMLayer< Dtype > Class Template Reference

Processes sequential inputs using a "Long Short-Term Memory" (LSTM) [1] style recurrent neural network (RNN). Implemented by unrolling the LSTM computation through time. More...

#include <lstm_layer.hpp>

Inheritance diagram for caffe::LSTMLayer< Dtype >:
caffe::RecurrentLayer< Dtype >

Public Member Functions

 LSTMLayer (const LayerParameter &param)
 
virtual const char * type () const
 
- Public Member Functions inherited from caffe::RecurrentLayer< Dtype >
 RecurrentLayer (const LayerParameter &param)
 
virtual void LayerSetUp (const vector< Blob< Dtype > * > &bottom, const vector< Blob< Dtype > * > &top)
 
virtual void Reshape (const vector< Blob< Dtype > * > &bottom, const vector< Blob< Dtype > * > &top)
 
virtual void Reset ()
 
virtual int MinBottomBlobs () const
 
virtual int MaxBottomBlobs () const
 
virtual int ExactNumTopBlobs () const
 
virtual bool AllowForceBackward (const int bottom_index) const
 

Protected Member Functions

virtual void FillUnrolledNet (NetParameter *net_param) const
 Fills net_param with the recurrent network architecture. Subclasses should define this – see RNNLayer and LSTMLayer for examples.
 
virtual void RecurrentInputBlobNames (vector< string > *names) const
 Fills names with the names of the 0th timestep recurrent input Blob&s. Subclasses should define this – see RNNLayer and LSTMLayer for examples.
 
virtual void RecurrentOutputBlobNames (vector< string > *names) const
 Fills names with the names of the Tth timestep recurrent output Blob&s. Subclasses should define this – see RNNLayer and LSTMLayer for examples.
 
virtual void RecurrentInputShapes (vector< BlobShape > *shapes) const
 Fills shapes with the shapes of the recurrent input Blob&s. Subclasses should define this – see RNNLayer and LSTMLayer for examples.
 
virtual void OutputBlobNames (vector< string > *names) const
 Fills names with the names of the output blobs, concatenated across all timesteps. Should return a name for each top Blob. Subclasses should define this – see RNNLayer and LSTMLayer for examples.
 
- Protected Member Functions inherited from caffe::RecurrentLayer< Dtype >
virtual void Forward_cpu (const vector< Blob< Dtype > * > &bottom, const vector< Blob< Dtype > * > &top)
 
virtual void Forward_gpu (const vector< Blob< Dtype > * > &bottom, const vector< Blob< Dtype > * > &top)
 
virtual void Backward_cpu (const vector< Blob< Dtype > * > &top, const vector< bool > &propagate_down, const vector< Blob< Dtype > * > &bottom)
 

Additional Inherited Members

- Protected Attributes inherited from caffe::RecurrentLayer< Dtype >
shared_ptr< Net< Dtype > > unrolled_net_
 A Net to implement the Recurrent functionality.
 
int N_
 The number of independent streams to process simultaneously.
 
int T_
 The number of timesteps in the layer's input, and the number of timesteps over which to backpropagate through time.
 
bool static_input_
 Whether the layer has a "static" input copied across all timesteps.
 
int last_layer_index_
 The last layer to run in the network. (Any later layers are losses added to force the recurrent net to do backprop.)
 
bool expose_hidden_
 Whether the layer's hidden state at the first and last timesteps are layer inputs and outputs, respectively.
 
vector< Blob< Dtype > * > recur_input_blobs_
 
vector< Blob< Dtype > * > recur_output_blobs_
 
vector< Blob< Dtype > * > output_blobs_
 
Blob< Dtype > * x_input_blob_
 
Blob< Dtype > * x_static_input_blob_
 
Blob< Dtype > * cont_input_blob_
 

Detailed Description

template<typename Dtype>
class caffe::LSTMLayer< Dtype >

Processes sequential inputs using a "Long Short-Term Memory" (LSTM) [1] style recurrent neural network (RNN). Implemented by unrolling the LSTM computation through time.

The specific architecture used in this implementation is as described in "Learning to Execute" [2], reproduced below: i_t := \sigmoid[ W_{hi} * h_{t-1} + W_{xi} * x_t + b_i ] f_t := \sigmoid[ W_{hf} * h_{t-1} + W_{xf} * x_t + b_f ] o_t := \sigmoid[ W_{ho} * h_{t-1} + W_{xo} * x_t + b_o ] g_t := \tanh[ W_{hg} * h_{t-1} + W_{xg} * x_t + b_g ] c_t := (f_t .* c_{t-1}) + (i_t .* g_t) h_t := o_t .* \tanh[c_t] In the implementation, the i, f, o, and g computations are performed as a single inner product.

Notably, this implementation lacks the "diagonal" gates, as used in the LSTM architectures described by Alex Graves [3] and others.

[1] Hochreiter, Sepp, and Schmidhuber, Jürgen. "Long short-term memory." Neural Computation 9, no. 8 (1997): 1735-1780.

[2] Zaremba, Wojciech, and Sutskever, Ilya. "Learning to execute." arXiv preprint arXiv:1410.4615 (2014).

[3] Graves, Alex. "Generating sequences with recurrent neural networks." arXiv preprint arXiv:1308.0850 (2013).


The documentation for this class was generated from the following files: