深度学习精炼秘笈
til, Ilya sutskever gave john carmack this reading list of approx 30 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’
- The Annotated Transformer. Sasha Rush, et al. [Blog] [GitHub]
- The First Law of Complexodynamics. Scott Aaronson. [Blog]
- The Unreasonable Effectiveness of Recurrent Neural Networks. Andrej Karpathy. [Blog]
- Understanding LSTM Networks. Christopher Olah. [Blog]
- Recurrent Neural Network Regularization. Wojciech Zaremba, et al. [arXiv]
- Keeping Neural Networks Simple by Minimizing the Description Length of the Weights. Geoffrey E. Hinton and Drew van Camp. [pdf]
- Pointer Networks. Oriol Vinyals, et al. [arXiv]
- ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, et al. [pdf]
- Order Matters: Sequence to sequence for sets. Oriol Vinyals, et al. [arXiv]
- GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism. Yanping Huang, et al. [arXiv]
- Deep Residual Learning for Image Recognition. Kaiming He, et al. [Presentation] [arXiv]
- Multi-Scale Context Aggregation by Dilated Convolutions. Fisher Yu and Vladlen Koltun. [arXiv]
- Neural Message Passing for Quantum Chemistry. Justin Gilmer, et al. [arXiv]
- Attention Is All You Need. Ashish Vaswani, et al. [arXiv]
- Neural Machine Translation by Jointly Learning to Align and Translate. Dzmitry Bahdanau, et al. [arXiv]
- Identity Mappings in Deep Residual Networks. Kaiming He, et al. [arXiv]
- A simple neural network module for relational reasoning. Adam Santoro, et al. [arXiv]
- Variational Lossy Autoencoder. Xi Chen, et al. [arXiv]
- Relational recurrent neural networks. Adam Santoro, et al. [arXiv]
- Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton. Scott Aaronson, et al. [arXiv]
- Neural Turing Machines. Alex Graves, et al. [arXiv]
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. Dario Amodei, et al. [arXiv]
- Scaling Laws for Neural Language Models. Jared Kaplan, et al. [arXiv]
- A Tutorial Introduction to the Minimum Description Length Principle. Peter Grunwald. [pdf]
- Machine Super Intelligence. Shane Legg. [Presentation]
- Kolmogorov Complexity and Algorithmic Randomness. A.Shen, V. A. Uspensky, and N. Vereshchagin. [pdf] Chapter 14.
- CS231n: Convolutional Neural Networks for Visual Recognition. [CS231n Home] [Course Notes]