

Control and Optimization 22 (1984), 570–598.Ī general denseness result for relaxed control theory. A general approach to lower semicontinuity and lower closure in optimal control theory. Semicontinuity problems in the calculus of variations. We'll use $w^l_ \sum_x C_x$ over cost functions $C_x$ forindividual training examples, $x$.Acerbi, E. Let's begin with a notation which lets us refer to weights in thenetwork in an unambiguous way. In particular, this is a good way ofgetting comfortable with the notation used in backpropagation, in afamiliar context. Warm up: a fast matrix-based approach to computing the output from a neural networkīefore discussing backpropagation, let's warm up with a fastmatrix-based algorithm to compute the output from a neural network.We actually already briefly saw this algorithm near the end of the last chapter, but I described it quickly, so it'sworth revisiting in detail. But at those points you should still be able tounderstand the main conclusions, even if you don't follow all thereasoning. Thereare, of course, points later in the book where I refer back to resultsfrom this chapter. I've written the rest of the book tobe accessible even if you treat backpropagation as a black box. With that said, if you want to skim the chapter, or jump straight tothe next chapter, that's fine. It actuallygives us detailed insights into how changing the weights and biaseschanges the overall behaviour of the network. And sobackpropagation isn't just a fast algorithm for learning. And while theexpression is somewhat complex, it also has a beauty to it, with eachelement having a natural, intuitive interpretation. The expression tells us how quicklythe cost changes when we change the weights and biases. At the heart ofbackpropagation is an expression for the partial derivative $\partialC / \partial w$ of the cost function $C$ with respect to any weight$w$ (or bias $b$) in the network. If you're not crazy about mathematics you may be tempted toskip the chapter, and to treat backpropagation as a black box whosedetails you're willing to ignore. This chapter is more mathematically involved than the rest of thebook. Today, thebackpropagation algorithm is the workhorse of learning in neuralnetworks. That paper describes severalneural networks where backpropagation works far faster than earlierapproaches to learning, making it possible to use neural nets to solveproblems which had previously been insoluble. The backpropagation algorithm was originally introduced in the 1970s,but its importance wasn't fully appreciated until a famous 1986 paper by David Rumelhart, Geoffrey Hinton, and Ronald Williams. That's quite a gap! Inthis chapter I'll explain a fast algorithm for computing suchgradients, an algorithm known as backpropagation. In the last chapter we saw how neural networks canlearn their weights and biases using the gradient descent algorithm.There was, however, a gap in our explanation: we didn't discuss how tocompute the gradient of the cost function. Goodfellow, Yoshua Bengio, and Aaron Courville Michael Nielsen's project announcement mailing list

Thanks to all the supporters who made the book possible, withĮspecial thanks to Pavel Dudrenov.

Deep Learning Workstations, Servers, and Laptops
