Neural networks, intuitively.
Almost everything labelled "AI" today runs on a neural network, yet the phrase sounds either mystical (a digital brain) or impenetrable (linear algebra). Neither is necessary. This entry builds the working intuition: a neural network is a big stack of adjustable knobs that turns numbers into numbers, and "learning" is the boring-but-powerful process of nudging those knobs until the output is right. No biology and no calculus required.
One neuron: a weighted vote.
Forget brains for a moment. A single artificial neuron is a tiny arithmetic recipe. It takes some input numbers, multiplies each by its own weight, adds the results together, adds one more number called the bias, and passes the total through a simple "activation" function that decides how strongly to fire.
The intuition: a neuron casts a weighted vote. The weights say how much each input matters and in which direction. Imagine a neuron deciding "is this email spam?" Its inputs might be the count of the words free, urgent, and meeting. If it learns a large positive weight on free, a large positive weight on urgent, and a negative weight on meeting, then emails full of "free" and "urgent" push the total up (vote: spam) while "meeting" pulls it down (vote: not spam). The activation function then squashes that total into a decision.
output = activation( w1*x1 + w2*x2 + w3*x3 + bias )
That single line is the entire mechanism. Everything else is repetition and scale.
Layers: many votes, then votes on the votes.
One neuron can only draw a straight dividing line through its inputs — too simple for real problems. So we use many neurons in parallel (a layer), and then feed their combined outputs into another layer, and another. Each layer's outputs become the next layer's inputs.
The power comes from this stacking. Early layers learn to detect simple, local patterns; later layers combine those into more abstract ones. In a network that recognises images, the first layer might respond to edges and color blobs, the next to corners and textures, the next to eyes and wheels, and the final layer to "cat" or "car." Nobody designed this hierarchy — it emerges because that is the most efficient way for the optimiser to use the available knobs.
The crucial ingredient is the activation function between layers. Without it, stacking layers would collapse into one big straight line no matter how many you add. The activation introduces a gentle bend, and bends stacked on bends can approximate essentially any pattern. This is why a sufficiently large network is sometimes called a "universal function approximator": with enough neurons it can mimic almost any input-to-output relationship.
The "deep" in deep learning just means many layers. More layers means the network can build a longer chain of "patterns of patterns of patterns," which is what lets it handle very complex inputs like language or images.
What "learning" actually is.
A fresh network has random weights, so it outputs nonsense. Learning is the procedure that fixes this, and it is conceptually simple — a feedback loop repeated millions of times:
- Predict. Run a training example through the network and see what it outputs.
- Measure the error. Compare the output to the correct answer using a number called the loss — large when the network is very wrong, near zero when it is right.
- Assign blame. Work out, for every single weight, whether nudging it up or down would have made the loss a little smaller. This blame-assignment step is called backpropagation; it is just the chain rule from calculus applied mechanically, and you do not need to do the math to trust that it works.
- Nudge. Move every weight a tiny step in the direction that reduces the loss. This is gradient descent — like a hiker in fog always stepping slightly downhill.
Repeat over the whole dataset, many times. Slowly, the random knobs settle into a configuration that produces good answers. There is no eureka moment and no understanding inside the network — just billions of tiny error-driven adjustments accumulating into competence. The famous analogy: water finding its way downhill. Each step is dumb; the destination is remarkable.
Why this gives both the strengths and the failures.
This single picture explains the behaviour you will see in every neural system. The network only ever minimised loss on its training data, so it is brilliant inside the region the data covered and unreliable outside it — it never learned a rule, only a fitted surface that happens to pass through the examples. Ask it about something far from anything it saw and it still produces a confident number, because nothing in the mechanism knows the difference between "interpolating" and "guessing."
It also explains why neural networks are hard to interpret. The "knowledge" is spread across millions or billions of weights with no human-readable meaning attached to any one of them. There is no line of code saying "this is a cat," only a vast numerical landscape sculpted by gradient descent. Tools exist to probe what a network learned, but fundamentally its competence is statistical, not symbolic.
Keep this model in your head and large language models stop being magic. A chatbot is exactly this — an enormous stack of weighted votes, its knobs tuned by gradient descent over a vast amount of text, producing one number per possible next word. Bigger and cleverly arranged, but the same machine: numbers in, numbers out, knobs nudged toward less error. Everything else in this section builds on this one picture.