downhill.first_order.SGD

class downhill.first_order.SGD(loss, params=None, inputs=None, updates=(), monitors=(), monitor_gradients=False)

Basic optimization using stochastic gradient descent.

Parameters:

learning_rate: float, optional (default 1e-4)

Step size to take during optimization.

momentum: float, optional (default 0)

Momentum to apply to the updates, if any. Defaults to 0 (no momentum). Set to a value close to 1 (e.g., 1 - 1e-4) for large amounts of momentum.

nesterov: bool, optional (default False)

Set this to True to enable Nesterov-style momentum updates, whenever momentum is nonzero.

Notes

A stochastic gradient trainer with momentum \(\mu\) and learning rate \(\alpha\) updates parameter \(\theta\) at step \(t\) by blending the current “velocity” \(v\) with the current gradient \(\frac{\partial\mathcal{L}}{\partial\theta}\):

\[\begin{split}\begin{eqnarray*} v_{t+1} &=& \mu v_t - \alpha \frac{\partial\mathcal{L}}{\partial\theta} \\ \theta_{t+1} &=& \theta_t + v_{t+1} \end{eqnarray*}\end{split}\]

Without momentum (i.e., when \(\mu = 0\)), these updates reduce to \(\theta_{t+1} = \theta_t - \alpha \frac{\partial\mathcal{L}}{\partial\theta}\), which just takes steps downhill according to the the local gradient.

Adding the momentum term permits the algorithm to incorporate information from previous steps as well, which in practice is thought to have the effect of incorporating some information about second-order derivatives of the loss surface.

References

[Rume86]D. E. Rumelhart, G. E. Hinton, & R. J. Williams. (1986) “Learning representations by back-propagating errors”. Nature 323 (6088):533–536. doi:10.1038/323533a0 http://www.nature.com/nature/journal/v323/n6088/abs/323533a0.html
__init__(loss, params=None, inputs=None, updates=(), monitors=(), monitor_gradients=False)

Methods

__init__(loss[, params, inputs, updates, ...])
evaluate(dataset) Evaluate the current model parameters on a dataset.
get_updates(**kwargs) Get parameter update expressions for performing optimization.
iterate([train, valid, max_updates]) Optimize a loss iteratively using a training and validation dataset.
minimize(*args, **kwargs) Optimize our loss exhaustively.
set_params([targets]) Set the values of the parameters to the given target values.