downhill.dataset.Dataset¶

class
downhill.dataset.
Dataset
(inputs, name=None, batch_size=32, iteration_size=None, axis=0, rng=None)¶ This class handles batching and shuffling a dataset.
In
downhill
, losses are optimized using sets of data collected from the problem that generated the loss.During optimization, data are grouped into “minibatches”—that is, chunks that are larger than 1 sample and smaller than the entire set of samples; typically the size of a minibatch is between 10 and 100, but the specific setting can be varied depending on your model, hardware, dataset, and so forth. These minibatches must be presented to the optimization algorithm in pseudorandom order to match the underlying stochasticity assumptions of many optimization algorithms. This class handles the process of grouping data into minibatches as well as iterating and shuffling these minibatches dynamically as the dataset is consumed by the optimization algorithm.
For many tasks, a dataset is obtained as a large block of sample data, which in Python is normally assembled as a
numpy
ndarray. To use this class on such a dataset, just pass in a list or tuple containingnumpy
arrays; the number of these arrays must match the number of inputs that your loss computation requires.There are some cases when a suitable set of training data would be prohibitively expensive to assemble in memory as a single
numpy
array. To handle these cases, this class can also handle a dataset that is provided via a Python callable. For more information on using callables to provide data to your model, see Using Callables.Parameters: inputs : callable or list of ndarray/sparse matrix/DataFrame/theano shared var
One or more sets of data.
If this parameter is callable, then minibatches will be obtained by calling the callable with no arguments; the callable is expected to return a tuple of ndarraylike objects that will be suitable for optimizing the loss at hand.
If this parameter is a list (or a tuple), it must contain arraylike objects:
numpy.ndarray
,scipy.sparse.csc_matrix
,scipy.sparse.csr_matrix
,pandas.DataFrame
ortheano.shared
. These are assumed to contain data for computing the loss, so the length of this tuple or list should match the number of inputs required by the loss computation. If multiple arrays are provided, their lengths along the axis given by theaxis
parameter (defaults to 0) must match.name : str, optional
A string that is used to describe this dataset. Usually something like ‘test’ or ‘train’.
batch_size : int, optional
The size of the minibatches to create from the data sequences. If this is negative or zero, all data in the dataset will be used in one batch. Defaults to 32. This parameter has no effect if
inputs
is callable.iteration_size : int, optional
The number of batches to yield for each call to iterate(). Defaults to the length of the data divided by batch_size. If the dataset is a callable, then the number is len(callable). If callable has no length, then the number is set to 100.
axis : int, optional
The axis along which to split the data arrays, if the first parameter is given as one or more ndarrays. If not provided, defaults to 0.
rng :
numpy.random.RandomState
or int, optionalA random number generator, or an integer seed for a random number generator. If not provided, the random number generator will be created with an automatically chosen seed.

__init__
(inputs, name=None, batch_size=32, iteration_size=None, axis=0, rng=None)¶
Methods
__init__
(inputs[, name, batch_size, ...])iterate
([shuffle])Iterate over batches in the dataset. shuffle
()Shuffle the batches in the dataset. 
iterate
(shuffle=True)¶ Iterate over batches in the dataset.
This method generates
iteration_size
batches from the dataset and then returns.Parameters: shuffle : bool, optional
Shuffle the batches in this dataset if the iteration reaches the end of the batch list. Defaults to True.
Yields: batches : data batches
A sequence of batches—often from a training, validation, or test dataset.

shuffle
()¶ Shuffle the batches in the dataset.
If this dataset was constructed using a callable, this method has no effect.
