Gradient Descent
Gradient Descent is an optimization algorithm that minimizes a function by iteratively adjusting its parameters in the direction of the steepest decrease. It is fundamental in machine learning for enabling models to learn efficiently from data.
Understanding Gradients
The gradient of a function represents the direction and steepness of the function at a given point. It tells us which way to move to minimize the function.
For a simple function:
J(ΞΈ)=ΞΈ2The derivative (gradient) is:
βJ(ΞΈ)=dΞΈdβ(ΞΈ2)=2ΞΈThis means that for any value of ΞΈ, the gradient tells us how to adjust ΞΈ to descend toward the minimum.
Gradient Descent Formula
The weight update rule is:
ΞΈβΞΈβΞ±βJ(ΞΈ)Where:
- ΞΈ - model parameter;
- Ξ± - learning rate (step size);
- βJ(ΞΈ) - gradient of the function we're aiming to minimize.
For our function:
ΞΈnewβ=ΞΈoldββΞ±(2ΞΈoldβ)This means we update ΞΈ iteratively by subtracting the scaled gradient.
Stepwise Movement β A Visual
Example with start values: ΞΈ=3, Ξ±=0.3
- ΞΈ1β=3β0.3(2Γ3)=3β1.8=1.2;
- ΞΈ2β=1.2β0.3(2Γ1.2)=1.2β0.72=0.48;
- ΞΈ3β=0.48β0.3(2Γ0.48)=0.48β0.288=0.192;
- ΞΈ4β=0.192β0.3(2Γ0.192)=0.192β0.115=0.077.
After a few iterations, we move toward ΞΈ=0, the minimum.
Learning Rate β Choosing Ξ± Wisely
- Too large Β Ξ± - overshoots, never converges;
- Too small Β Ξ± - converges too slowly;
- Optimal Β Ξ± - balances speed & accuracy.
When Does Gradient Descent Stop?
Gradient descent stops when:
βJ(ΞΈ)β0This means that further updates are insignificant and we've found a minimum.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain how to choose a good learning rate?
What happens if the gradient never reaches zero?
Can you show a real-world example where gradient descent is used?
Awesome!
Completion rate improved to 1.96
Gradient Descent
Swipe to show menu
Gradient Descent is an optimization algorithm that minimizes a function by iteratively adjusting its parameters in the direction of the steepest decrease. It is fundamental in machine learning for enabling models to learn efficiently from data.
Understanding Gradients
The gradient of a function represents the direction and steepness of the function at a given point. It tells us which way to move to minimize the function.
For a simple function:
J(ΞΈ)=ΞΈ2The derivative (gradient) is:
βJ(ΞΈ)=dΞΈdβ(ΞΈ2)=2ΞΈThis means that for any value of ΞΈ, the gradient tells us how to adjust ΞΈ to descend toward the minimum.
Gradient Descent Formula
The weight update rule is:
ΞΈβΞΈβΞ±βJ(ΞΈ)Where:
- ΞΈ - model parameter;
- Ξ± - learning rate (step size);
- βJ(ΞΈ) - gradient of the function we're aiming to minimize.
For our function:
ΞΈnewβ=ΞΈoldββΞ±(2ΞΈoldβ)This means we update ΞΈ iteratively by subtracting the scaled gradient.
Stepwise Movement β A Visual
Example with start values: ΞΈ=3, Ξ±=0.3
- ΞΈ1β=3β0.3(2Γ3)=3β1.8=1.2;
- ΞΈ2β=1.2β0.3(2Γ1.2)=1.2β0.72=0.48;
- ΞΈ3β=0.48β0.3(2Γ0.48)=0.48β0.288=0.192;
- ΞΈ4β=0.192β0.3(2Γ0.192)=0.192β0.115=0.077.
After a few iterations, we move toward ΞΈ=0, the minimum.
Learning Rate β Choosing Ξ± Wisely
- Too large Β Ξ± - overshoots, never converges;
- Too small Β Ξ± - converges too slowly;
- Optimal Β Ξ± - balances speed & accuracy.
When Does Gradient Descent Stop?
Gradient descent stops when:
βJ(ΞΈ)β0This means that further updates are insignificant and we've found a minimum.
Thanks for your feedback!