Embeddings
read_embedding
def read_embedding(file_path, vector_dim)
Reads an embedding file in glove format into a dictionary mapping tokens to vectors.
embedding_to_matrix
def embedding_to_matrix(embedding, token_index, embedding_dim)
Converts an embedding dictionary into a weights matrix used to initialize an embedding layer. It ensures that all tokens in the token_index dictionare are mapped to a row, even those that are not contained in the provided embedding dictionary. Unknown tokens are initialized with a random vector with entries between -1 and 1.
Args
-
embedding: dictionary mapping tokens to embedding vectors
-
token_index: dictionary mapping tokens to indices that are fed into the embedding layer
-
embedding_dim: size of the embedding vectors