Class Batch

java.lang.Object
ai.djl.training.dataset.Batch
All Implemented Interfaces:
AutoCloseable

public class Batch extends Object implements AutoCloseable
A Batch is used to hold multiple items (data and label pairs) from a Dataset.

When training and performing inference, it is often more efficient to run multiple items through a network simultaneously rather than one at a time. For this reason, much of the API is oriented around the Batch class.

In a Batch, data and label are each an NDList. The data NDList represents the data for each input in the batch. The number of NDArrays in the NDList is based on the number of different kinds of inputs, not the batch size. Similarly, the label NDList represents the labels for each kind of output.

For example, an Image Question and Answer dataset has two inputs: an image and a question. In this case, the data in the Batch will be an NDList containing an NCHW image NDArray and an NTC question NDArray. The label will be an NDList containing only an NTC answer NDArray.

In order to differentiate a batch vs a single record (despite them both consisting of two NDLists), we have the Batch and the Record respectively.

  • Constructor Details

    • Batch

      public Batch(NDManager manager, NDList data, NDList labels, int size, Batchifier dataBatchifier, Batchifier labelBatchifier, long progress, long progressTotal)
      Creates a new instance of Batch with the given manager, data and labels.
      Parameters:
      manager - the manager for the Batch
      data - the NDList containing the data
      labels - the NDList containing the labels
      size - (batchSize) the number of Records in the batch
      dataBatchifier - the Batchifier that is used to split data
      labelBatchifier - the Batchifier that is used for split labels
      progress - the progress of the batch if it is part of some kind of iteration like a dataset iteration. Returns 0 if there is no iteration.
      progressTotal - the total or end value for the progress of the batch if it is part of
    • Batch

      public Batch(NDManager manager, NDList data, NDList labels, int size, Batchifier dataBatchifier, Batchifier labelBatchifier, long progress, long progressTotal, List<?> indices)
      Creates a new instance of Batch with the given manager, data and labels.
      Parameters:
      manager - the manager for the Batch
      data - the NDList containing the data
      labels - the NDList containing the labels
      size - (batchSize) the number of Records in the batch
      dataBatchifier - the Batchifier that is used to split data
      labelBatchifier - the Batchifier that is used for split labels
      progress - the progress of the batch if it is part of some kind of iteration like a dataset iteration. Returns 0 if there is no iteration.
      progressTotal - the total or end value for the progress of the batch if it is part of
      indices - the indices used to extract the data and labels
  • Method Details

    • getManager

      public NDManager getManager()
      Gets the NDManager that is attached to this Batch.
      Returns:
      the NDManager attached to this Batch
    • getData

      public NDList getData()
      Gets the data of this Batch.
      Returns:
      an NDList that contains the data
    • getLabels

      public NDList getLabels()
      Gets the labels corresponding to the data of this Batch.
      Returns:
      an NDList that contains the labels
    • getSize

      public int getSize()
      Returns the batchSize.
      Returns:
      the batchSize or number of Records in the batch
    • getProgress

      public long getProgress()
      Returns the progress of the batch if it is part of some kind of iteration like a dataset iteration.
      Returns:
      the progress of the batch if it is part of some kind of iteration like a dataset iteration. Returns 0 if there is no iteration
    • getProgressTotal

      public long getProgressTotal()
      Returns the total or end value for the progress of the batch if it is part of some kind of iteration like a dataset iteration.
      Returns:
      the total or end value for the progress of the batch if it is part of some kind of iteration like a dataset iteration. Returns 0 if there is no iteration
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
    • split

      public Batch[] split(Device[] devices, boolean evenSplit)
      Splits the data and labels in the Batch across the given devices.

      if evenSplit is false, that last device may have a smaller batch than the rest.

      Parameters:
      devices - an array of Device across which the data must be split
      evenSplit - whether each slice must have the same shape
      Returns:
      an array of Batch, each of which corresponds to a Device
    • getIndices

      public List<?> getIndices()
      Returns the indices used to extract the data and labels from the Dataset.
      Returns:
      a list of Long if the Dataset is a RandomAccessDataset, otherwise may return null.