Sequence Models
Cite
As in copyright:
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute these slides for commercial purposes. You may make copies of these slides and use or distribute them for educational purposes as long as you cite DeepLearning.AI as the source of the slides.
- Slides here are from DeepLearning.AI
Week 1
Assignment 1
RNN Architecture:
- Recurrent Neural Networks (RNN) are very effective for Natural Language Processing and other sequence tasks because they have "memory.”
-
The example of RNN with len(input) = len(output)
- Each input \(x^{(t)}\)can be an one-hot vector, or just a value. e.g. A language with a 5000-word vocabulary could be one-hot encoded into a vector that has 5000 units. So \(x^{(t)}\) shape = (5000,)
- The activation a⟨t⟩ that is passed to the RNN from one time step to another is called a
hidden state
RNN Cell:
- Think of the recurrent neural network as the repeated use of a single cell
-
The following figure describes the operations for a single time step of an RNN cell:
LSTM Cell:
- An LSTM is similar to an RNN in that they both use hidden states to pass along information, but an LSTM also uses a cell state, which is like a long-term memory, to help deal with the issue of vanishing gradients
- An LSTM cell consists of a cell state, or long-term memory, a hidden state, or short-term memory
-
Here is the cell detail:
-
The simple explanation of the flow through:
- Similar to simple RNN,
a(t-1)
andx(t)
are the inputs (near the bottom left) They are used to calculate:- Forget Gate - A “mask” vector, used for another input
c(t-1)
calculation, between 0~1 - Update Gate - A “mask” vector, that decides if the Candidate values
𝐜̃ ⟨𝑡⟩
can pass into the hidden statec(t)
, between 0~1 - Candidate value - the value calculated from previous activation and current input, between -1~1
- Output Gate - A “mask” vector, decides what values are passed into Output y, and next activation
- Forget Gate - A “mask” vector, used for another input
c(t-1)
is another new input here (left side)- With the Forget Gate from above, values are decided to whether pass in or not
- Then add with the final candidate value, it become the next
c(t)
- Similar to simple RNN,
-
About the
Forget Gate
: -
About the
Candidate Value
-
Update gate
: -
Output gate
: -
At last, final
cell state
-
Hidden state
for next cell -
Prediction output
Assignment 2:
x (27, 1)
Wax (100, 27)
Waa (100, 100)
b (100, 1)
a (100, 1)
Wya (27, 100)
by (27, 1)
y (27, 1)
char_to_ix = { ch:i for i,ch in enumerate(chars) }
ix_to_char = { i:ch for i,ch in enumerate(chars) }
clip(gradients, maxValue)
def sample(parameters, char_to_ix, seed):
optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
def rnn_forward(X, Y, a_prev, parameters):
""" Performs the forward propagation through the RNN and computes the cross-entropy loss.
It returns the loss' value as well as a "cache" storing values to be used in backpropagation."""
....
return loss, cache
def rnn_backward(X, Y, parameters, cache):
""" Performs the backward propagation through time to compute the gradients of the loss with respect
to the parameters. It returns also all the hidden states."""
...
return gradients, a
def update_parameters(parameters, gradients, learning_rate):
""" Updates parameters using the Gradient Descent Update Rule."""
...
return parameters
np.random.seed(0)
probs = np.array([0.1, 0.0, 0.7, 0.2])
idx = np.random.choice(range(len(probs)), p = probs)
Generating text:
-
After the language model is trained, we can use it to generate pieces of text
- The
sample
process here is to : sampling a random text in the chosen texts, to prevent generating the same text every time.- How to: With the output of softmax, we have a vector of probability. Then we choose the words by taking its probability. E.g. word at i-th index = 16%, it might be picked with a chance of 16%
- The