Package ai.djl.training.dataset
Class RandomAccessDataset
java.lang.Object
ai.djl.training.dataset.RandomAccessDataset
- All Implemented Interfaces:
Dataset
- Direct Known Subclasses:
ArrayDataset
RandomAccessDataset represent the dataset that support random access reads. i.e. it could access
a specific data item given the index.
Almost all datasets in DJL extend, either directly or indirectly, RandomAccessDataset.
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset
Dataset.Usage -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected Batchifierprotected Deviceprotected Batchifierprotected longprotected Pipelineprotected intprotected Samplerprotected Pipeline -
Constructor Summary
ConstructorsConstructorDescriptionRandomAccessDataset(RandomAccessDataset.BaseBuilder<?> builder) Creates a new instance ofRandomAccessDatasetwith the given necessary configurations. -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract longReturns the number of records available to be read in thisDataset.abstract RecordGets theRecordfor the given index from the dataset.Fetches an iterator that can iterate through theDataset.Fetches an iterator that can iterate through theDatasetwith a custom sampler.getData(NDManager manager, Sampler sampler, ExecutorService executorService) Fetches an iterator that can iterate through theDatasetwith a custom sampler multi-threaded.getData(NDManager manager, ExecutorService executorService) Fetches an iterator that can iterate through theDatasetwith multiple threads.protected RandomAccessDatasetnewSubDataset(int[] indices, int from, int to) protected RandomAccessDatasetnewSubDataset(List<Long> subIndices) randomSplit(int... ratio) Splits the dataset set into multiple portions.longsize()Returns the size of thisDataset.subDataset(int fromIndex, int toIndex) Returns a view of the portion of this data between the specifiedfromIndex, inclusive, andtoIndex, exclusive.subDataset(List<Long> subIndices) Returns a view of the portion of this data for the specifiedsubIndices.subDataset(List<K> recordKeys, List<K> subRecordKeys) Returns a view of the portion of this data for the specified record keys.subDataset(Map<K, Long> indicesOfRecordKeys, List<K> subRecordKeys) Returns a view of the portion of this data for the specified record keys.Returns the dataset contents as a Java array.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface ai.djl.training.dataset.Dataset
matchingTranslatorOptions, prepare, prepare
-
Field Details
-
sampler
-
dataBatchifier
-
labelBatchifier
-
pipeline
-
targetPipeline
-
prefetchNumber
protected int prefetchNumber -
limit
protected long limit -
device
-
-
Constructor Details
-
RandomAccessDataset
Creates a new instance ofRandomAccessDatasetwith the given necessary configurations.- Parameters:
builder- a builder with the necessary configurations
-
-
Method Details
-
get
Gets theRecordfor the given index from the dataset.- Parameters:
manager- the manager used to create the arraysindex- the index of the requested data item- Returns:
- a
Recordthat contains the data and label of the requested data item - Throws:
IOException- if an I/O error occurs
-
getData
Fetches an iterator that can iterate through theDataset.- Specified by:
getDatain interfaceDataset- Parameters:
manager- the dataset to iterate through- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
getData
public Iterable<Batch> getData(NDManager manager, ExecutorService executorService) throws IOException, TranslateException Fetches an iterator that can iterate through theDatasetwith multiple threads.- Specified by:
getDatain interfaceDataset- Parameters:
manager- the dataset to iterate throughexecutorService- the executorService to use for multi-threading- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
getData
public Iterable<Batch> getData(NDManager manager, Sampler sampler) throws IOException, TranslateException Fetches an iterator that can iterate through theDatasetwith a custom sampler.- Parameters:
manager- the manager to create the arrayssampler- the sampler to use to iterate through the dataset- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
getData
public Iterable<Batch> getData(NDManager manager, Sampler sampler, ExecutorService executorService) throws IOException, TranslateException Fetches an iterator that can iterate through theDatasetwith a custom sampler multi-threaded.- Parameters:
manager- the manager to create the arrayssampler- the sampler to use to iterate through the datasetexecutorService- the executorService to multi-thread with- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
size
public long size()Returns the size of thisDataset.- Returns:
- the size of this
Dataset
-
availableSize
protected abstract long availableSize()Returns the number of records available to be read in thisDataset.- Returns:
- the number of records available to be read in this
Dataset
-
randomSplit
Splits the dataset set into multiple portions.- Parameters:
ratio- the ratio of each sub dataset- Returns:
- an array of the sub dataset
- Throws:
IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-
subDataset
Returns a view of the portion of this data between the specifiedfromIndex, inclusive, andtoIndex, exclusive.- Parameters:
fromIndex- low endpoint (inclusive) of the subDatasettoIndex- high endpoint (exclusive) of the subData- Returns:
- a view of the specified range within this dataset
-
subDataset
Returns a view of the portion of this data for the specifiedsubIndices.- Parameters:
subIndices- sub-set of indices of this dataset- Returns:
- a view of the specified indices within this dataset
-
subDataset
Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys inrecordKeys, thensubRecordKeysdefines the view on the corresponding records of the database.- Type Parameters:
K- the record key type.- Parameters:
recordKeys- unique keys for all records of this dataset.subRecordKeys- keys to define the view on the dataset. All keys insubRecordKeysmust be contained inrecordKeysbut may occur more than once.- Returns:
- a view of the specified records within this dataset
-
subDataset
Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys inindicesOfRecordKeys, thensubRecordKeysdefines the view on the corresponding records of the database.- Type Parameters:
K- the record key type.- Parameters:
indicesOfRecordKeys- Map for keys of the records in this dataset to their index position within this dataset. While this map typically maps all records, technically it just needs to map the ones occurring insubRecordKeys.subRecordKeys- Keys to define the view on the dataset. All keys insubRecordKeysmust be contained inindicesOfRecordKeysbut may occur more than once.- Returns:
- a view of the records identified by the specified keys of this dataset
-
newSubDataset
-
newSubDataset
-
toArray
public ai.djl.util.Pair<Number[][],Number[][]> toArray(NDManager manager) throws IOException, TranslateException Returns the dataset contents as a Java array.Each Number[] is a flattened dataset record and the Number[][] is the array of all records.
- Parameters:
manager- the manager to create the arrays- Returns:
- the dataset contents as a Java array
- Throws:
IOException- for various exceptions depending on the datasetTranslateException- if there is an error while processing input
-