A complete walkthrough of machine learning training — using a model with just two parameters.
Imagine you're blindfolded, feeling your way around a volume knob, trying to find just the right level. You have no idea where the right position is, so you give it a random turn — too loud. You nudge it down a little. Still a bit much. Nudge it again. Slowly, you get closer and closer to that sweet spot.
Machine learning training works exactly like this: make a guess, see how far off you are, nudge things in the right direction, and repeat. Over time, the model's parameters converge toward the correct answer.
Let's walk through every step of this process with a minimal example.
Our model is a single formula with two parameters:
a and b are the parameters we want to "learn." We give them arbitrary initial values before training — the goal is to make them progressively more accurate. The full experiment setup:
The learning rate controls how much we adjust the knob each step. Too large and we overshoot; too small and we need many more steps. 0.1 is a common starting point.
With the initial parameters: y_pred = 3 × 1 + 4 = 7, but the true answer is 4. We're off by 3. Training starts here.
Every training round, the model does four things:
Plugging in numbers — given L(a) = (ax + b − y_true)², let e = error:
The gradient is not a magic formula — it's just the result of expanding the Loss definition and cancelling terms step by step.
Update done. Start the next round. Repeat all four steps.
The table below records parameters, predictions, errors, and loss at each step. Focus on the error and Loss columns — watch them shrink step by step.
| Step | a | b | y_pred | error | grad ∂a | grad ∂b | Loss |
|---|---|---|---|---|---|---|---|
| 0 | 3.0000 | 4.0000 | 7.0000 | 3.0000 | 6.0000 | 6.0000 | 9.0000 |
| 1 | 2.4000 | 3.4000 | 5.8000 | 1.8000 | 3.6000 | 3.6000 | 3.2400 |
| 2 | 2.0400 | 3.0400 | 5.0800 | 1.0800 | 2.1600 | 2.1600 | 1.1664 |
| 3 | 1.8240 | 2.8240 | 4.6480 | 0.6480 | 1.2960 | 1.2960 | 0.4199 |
| 4 | 1.6944 | 2.6944 | 4.3888 | 0.3888 | 0.7776 | 0.7776 | 0.1512 |
| 5 | 1.6166 | 2.6166 | 4.2332 | 0.2332 | 0.4664 | 0.4664 | 0.0544 |
A visual sense of how Loss drops —
What do you notice?
The gradient shrinks as the error shrinks, so each update automatically becomes more cautious. The closer you get to the target, the lighter the touch.
In this example x=1, so both parameters receive identical gradients and move together. In real models, each parameter gets its own distinct gradient.
The model never "calculates" the right answer — it approaches it incrementally. This is the core idea behind how deep learning handles complex problems.
Both a=1, b=3 and a=0, b=4 satisfy a+b=4. Real training requires massive, diverse datasets to push the parameters toward a solution that actually generalises.
Keep this skeleton in mind — every neural network, no matter how large, loops through the same four steps:
Training is not about computing the right answer directly. It's about reading the current error and pushing the parameters a little closer to better — one step at a time.
Two parameters or a hundred billion — the mechanism is identical.