Skip to content
Snippets Groups Projects
user avatar
Guy Jacob authored
The PyTorch trace mechanism doesn't "see" torch.nn.ModuleList modules
(since they don't have a forward function). As a result, the mapping
from module names at the Python model definition level to the
scope-names at the trace level is not 1:1. This makes it impossible for
us to map back from SummaryGraph ops to their respective nn.Modules,
which is required for flows like BatchNorm folding and stats fusion in
post-training quantization.

In #313 we handled this issue specifically in DistillerLSTM, but it
makes much more sense to have a generic and automatic solution for this
issue, which doesn't require the user to modify the model. This is such
a solution.
    
* Implemented DistillerModuleList, a replacement for nn.ModuleList
  which results in full and unique scope-names
* See documentation for this class in summary_graph.py for extensive
  details on the issue and solution
* When generating a SummaryGraph, the model is scanned and all instances
  of torch.nn.ModuleList are replaced with DistillerModulelist
* Add tests for new functionality
* Partially revert changes made to DistillerLSTM in commit 43548deb:
  Keep the refactored _create_cells_list function, but have it create
  a standard torch.nn.ModuleList (since we're the ModuleList issue
  automatically now, and no need to confuse users with ad-hoc list 
  implementations
b614330c
History

Word-level language modeling RNN

This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. By default, the training script uses the Wikitext-2 dataset, provided. The trained model can then be used by the generate script to generate new text.

python main.py --cuda --epochs 6        # Train a LSTM on Wikitext-2 with CUDA, reaching perplexity of 117.61
python main.py --cuda --epochs 6 --tied # Train a tied LSTM on Wikitext-2 with CUDA, reaching perplexity of 110.44
python main.py --cuda --tied            # Train a tied LSTM on Wikitext-2 with CUDA for 40 epochs, reaching perplexity of 87.17
python generate.py                      # Generate samples from the trained LSTM model.

The model uses the nn.RNN module (and its sister modules nn.GRU and nn.LSTM) which will automatically use the cuDNN backend if run on CUDA with cuDNN installed.

During training, if a keyboard interrupt (Ctrl-C) is received, training is stopped and the current model is evaluated against the test dataset.

The main.py script accepts the following arguments:

optional arguments:
  -h, --help         show this help message and exit
  --data DATA        location of the data corpus
  --model MODEL      type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)
  --emsize EMSIZE    size of word embeddings
  --nhid NHID        number of hidden units per layer
  --nlayers NLAYERS  number of layers
  --lr LR            initial learning rate
  --clip CLIP        gradient clipping
  --epochs EPOCHS    upper epoch limit
  --batch-size N     batch size
  --bptt BPTT        sequence length
  --dropout DROPOUT  dropout applied to layers (0 = no dropout)
  --decay DECAY      learning rate decay per epoch
  --tied             tie the word embedding and softmax weights
  --seed SEED        random seed
  --cuda             use CUDA
  --log-interval N   report interval
  --save SAVE        path to save the final model

With these arguments, a variety of models can be tested. As an example, the following arguments produce slower but better models:

python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40           # Test perplexity of 80.97
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied    # Test perplexity of 75.96
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40        # Test perplexity of 77.42
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied # Test perplexity of 72.30

Perplexities on PTB are equal or better than Recurrent Neural Network Regularization (Zaremba et al. 2014) and are similar to Using the Output Embedding to Improve Language Models (Press & Wolf 2016 and Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016), though both of these papers have improved perplexities by using a form of recurrent dropout (variational dropout).