BucketGenerator

class BucketGenerator

Performs bucketing of elements in a dataset by length.

__init__

def __init__(element_length_function, batch_size, buffer_size_batches, batches_to_bucket, shuffle, seed)

Initializes the BucketGenerator.

Args
  • element_length_function: Element_length_function

  • batch_size: The size of the batches to bucket the sequences into buffer_size_batches

  • batches_to_bucket: Number of batches in buffer to use for bucketing. If set to buffer_size_batches, the resulting batches will be deterministic.

  • shuffle: Whether to shuffle elements across batches and the resulting buckets.

  • seed: Seed for shuffling.

__call__

def __call__(data)

Returns iterable of data with elements ordered by bucketed sequence lengths, e.g for batch size = 2 the transformation could look like this: [1], [3, 3, 3], [1], [4, 4, 4, 4] -> [1], [1], [3, 3, 3], [4, 4, 4, 4]