Gradient
Last updated
Last updated
A gradient is a vector that stores the partial derivatives of multi variable functions, often denoted by . It helps us calculate the slope at a specific point on a curve for functions with multiple independent variables.
Consider a function with two variables (x and y):
1) Find partial derivative with respect to x (Treat y as a constant like a random number 12)
2) Find partial derivative with respect to y (Treat x as a constant)
3) Store partial derivatives in a gradient
There are two additional properties of gradients that are especially useful in deep learning. A gradient:
Always points in the direction of greatest increase of a function (explained here)
Is zero at a local maximum or local minimum
Note: Directional derivative of a function is a scalar while gradient is a vector.
As described above, we take the dot product of the gradient and the directional vector:
We can rewrite the dot product as:
Link: - http://wiki.fast.ai/index.php/Calculus_for_Deep_Learning
The directional derivative is the rate at which the function changes at a point in the direction .
Directional derivative is computed by taking the dot product of the gradient of and a unit vector
Consider a function with two variables (x and y):
Hence, the directional derivative at co-ordinates is: