Natural Language Processing with Deep Learning
Published:
I recently completed Stanford’s course on natural language processing. This course opened my eyes to different representaions of speech and linguistics.
Moreover, such novel models such as NMT begs the question - why is attention so important? To me, the question is grounded in rethinking neural networks as essentially function approximations. Neural networks’ ability to approximate different classes of functions depends on its architecture. A typical neural net is implemented as a chain of matrix multiplications and element-wise non-linearities, where elements of the input or feature vectors interact with each other only by addition.
Attention mechanisms compute a mask which is used to multiply features. This seemingly simple extension has profound implications: suddenly, the space of functions that can be well approximated by a neural net is vastly expanded. We extend the basic addition and element wise multiplication in hidden states to all states combining both approaches!
That’s a brief interlude to other approaches that I am currently trying to udnerstand and to articulate in my applied work.
Oh, and before I forget - the course artifacts are here.