downhill.base.Optimizer

class downhill.base.Optimizer(loss, params=None, inputs=None, updates=(), monitors=(), monitor_gradients=False)

An optimizer computes gradient updates to iteratively optimize a loss.

Parameters:

loss : Theano expression

Loss function to minimize. This must be a scalar-valued expression.

params : list of Theano variables, optional

Symbolic variables to adjust to minimize the loss. If not given, these will be computed automatically by walking the computation graph.

inputs : list of Theano variables, optional

Symbolic variables required to compute the loss. If not given, these will be computed automatically by walking the computation graph.

updates : list of update pairs, optional

A list of pairs providing updates for the internals of the loss computation. Normally this is empty, but it can be provided if the loss, for example, requires an update to an internal random number generator.

monitors : sequence of (str, Theano expression) tuples, optional

Additional values to monitor during optimization. These must be provided as a sequence of (name, expression) tuples.

monitor_gradients : bool, optional

If True, add monitors to log the norms of the parameter gradients during optimization. Defaults to False.

Attributes

patience (int, optional) Number of validation “failures” that we are willing to tolerate before stopping the optimization process. A validation failure happens whenever the loss on the validation dataset decreases by less than min_improvement (relative) over the previous best validation loss. Defaults to 5.
validate_every (int, optional) Evaluate the loss on the validation dataset after making this many passes over the training data. Defaults to 10.
min_improvement (float, optional) Insist that the validation loss must improve by this relative amount before considering that the optimization has made progress. The optimization process halts when patience validations have failed to make this relative improvement. Defaults to 0; set to a larger value (e.g., 0.01 for 1% improvement) to halt the optimization process sooner.
max_gradient_norm (float, optional) Rescale each parameter’s gradient so that it has at most this L2 norm. Set to 0 (the default) to disable norm rescaling. If max_gradient_elem is also specified, then this has no effect.
max_gradient_elem (float, optional) Perform elementwise clipping on the magnitude of gradient values. Set to 0 (the default) to disable. If elementwise clipping is enabled, norm rescaling (via max_gradient_norm) will have no effect. Deprecated synonyms of this parameter are “max_gradient_clip” and “gradient_clip”.
learning_rate (float, optional) Many SGD-based optimization algorithms require a learning rate hyperparameter that scales the gradient step. Defaults to 1e-4.
momentum (float, optional) Apply momentum to the parameter updates for this optimizer, with the given strength. Typically this value ranges from 0 (no momentum) to \(1 - \epsilon\) (large momentum). Defaults to 0.
nesterov (bool, optional) If True, and momentum is nonzero, apply Nesterov-style momentum to parameter updates for this optimizer. If False, and momentum is nonzero, “regular” momentum is applied. Has no effect if momentum is zero. See NAG for a description of Nesterov momentum.
__init__(loss, params=None, inputs=None, updates=(), monitors=(), monitor_gradients=False)

Methods

__init__(loss[, params, inputs, updates, ...])
evaluate(dataset) Evaluate the current model parameters on a dataset.
get_updates(**kwargs) Get parameter update expressions for performing optimization.
iterate([train, valid, max_updates]) Optimize a loss iteratively using a training and validation dataset.
minimize(*args, **kwargs) Optimize our loss exhaustively.
set_params([targets]) Set the values of the parameters to the given target values.
evaluate(dataset)

Evaluate the current model parameters on a dataset.

Parameters:

dataset : Dataset

A set of data to use for evaluating the model.

Returns:

monitors : OrderedDict

A dictionary mapping monitor names to values. Monitors are quantities of interest during optimization—for example, loss function, accuracy, or whatever the optimization task requires.

get_updates(**kwargs)

Get parameter update expressions for performing optimization.

Keyword arguments can be applied here to set any of the global optimizer attributes.

Yields:

updates : (parameter, expression) tuples

A sequence of parameter updates to be applied during optimization.

iterate(train=None, valid=None, max_updates=None, **kwargs)

Optimize a loss iteratively using a training and validation dataset.

This method yields a series of monitor values to the caller. After every optimization epoch, a pair of monitor dictionaries is generated: one evaluated on the training dataset during the epoch, and another evaluated on the validation dataset at the most recent validation epoch.

The validation monitors might not be updated during every optimization iteration; in this case, the most recent validation monitors will be yielded along with the training monitors.

Additional keyword arguments supplied here will set the global optimizer attributes.

Parameters:

train : sequence or Dataset

A set of training data for computing updates to model parameters.

valid : sequence or Dataset

A set of validation data for computing monitor values and determining when the loss has stopped improving. Defaults to the training data.

max_updates : int, optional

If specified, halt optimization after this many gradient updates have been processed. If not provided, uses early stopping to decide when to halt.

Yields:

train_monitors : dict

A dictionary mapping monitor names to values, evaluated on the training dataset.

valid_monitors : dict

A dictionary containing monitor values evaluated on the validation dataset.

minimize(*args, **kwargs)

Optimize our loss exhaustively.

This method is a thin wrapper over the iterate() method. It simply exhausts the iterative optimization process and returns the final monitor values.

Returns:

train_monitors : dict

A dictionary mapping monitor names to values, evaluated on the training dataset.

valid_monitors : dict

A dictionary containing monitor values evaluated on the validation dataset.

set_params(targets=None)

Set the values of the parameters to the given target values.

Parameters:

targets : sequence of ndarray, optional

Arrays for setting the parameters of our model. If this is not provided, the current best parameters for this optimizer will be used.