Embeddings

read_embedding

def read_embedding(file_path, vector_dim)

Reads an embedding file in glove format into a dictionary mapping tokens to vectors.

embedding_to_matrix

def embedding_to_matrix(embedding, token_index, embedding_dim)

Converts an embedding dictionary into a weights matrix used to initialize an embedding layer. It ensures that all tokens in the token_index dictionare are mapped to a row, even those that are not contained in the provided embedding dictionary. Unknown tokens are initialized with a random vector with entries between -1 and 1.

Args
  • embedding: dictionary mapping tokens to embedding vectors

  • token_index: dictionary mapping tokens to indices that are fed into the embedding layer

  • embedding_dim: size of the embedding vectors