Algorithmic, or automatic, differentiation (AD) is concerned with the accurate and efficient evaluation of derivatives for functions defined by computer programs. No truncation errors are incurred, and the resulting numerical derivative values can be used for all scientific computations that are based on linear, quadratic, or even higher order approximations to nonlinear scalar or vector functions. In particular, AD has been applied to optimization, parameter identification, equation solving, the numerical integration of differential equations, and combinations thereof. Apart from quantifying sensitivities numerically, AD techniques can also provide structural information, e.g., sparsity pattern and generic rank of Jacobian matrices.
This first comprehensive treatment of AD describes all chainrule-based techniques for evaluating derivatives of composite functions with particular emphasis on the reverse, or adjoint, mode. The corresponding complexity analysis shows that gradients are always relatively cheap, while the cost of evaluating Jacobian and Hessian matrices is found to be strongly dependent on problem structure and its efficient exploitation. Attempts to minimize operations count and/or memory requirement lead to hard combinatorial optimization problems in the case of Jacobians and a well-defined trade-off curve between spatial and temporal complexity for gradient evaluations.
The book is divided into three parts: a stand-alone introduction to the fundamentals of AD and its software, a thorough treatment of methods for sparse problems, and final chapters on higher derivatives, nonsmooth problems, and program reversal schedules. Each of the chapters concludes with examples and exercises suitable for students with a basic understanding of differential calculus, procedural programming, and numerical linear algebra.