By far the most common form of optimization for neural network training is stochastic gradient descent (SGD). SGD has many variations including Adam (adaptive momentum estimation), Adagrad (adaptive ...
Spiral Dynamics Optimization with Python Dr. James McCaffrey of Microsoft Research explains how to implement a geometry-inspired optimization technique called spiral dynamics optimization (SDO), an ...