Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Activation Functions as Mathematical Operators | Neural Networks as Linear-Algebraic Objects
Mathematical Foundations of Neural Networks

bookActivation Functions as Mathematical Operators

After a neural network computes a linear transformation of its input — such as multiplying by a weight matrix and adding a bias — it applies a function called an activation function to each component of the result. Activation functions are applied pointwise: for each output of the linear map, you independently transform it using the same mathematical rule. This operation introduces nonlinearity into the network, which is crucial for modeling complex, real-world relationships that cannot be captured by linear functions alone.

Note
Definition

An activation function is a mathematical function applied to each element of a vector (or matrix) output by a linear transformation in a neural network. Common examples include:

  • ReLU (Rectified Linear Unit): f(x)=max(0,x)f(x) = \max(0, x);
  • Sigmoid: f(x)=1/(1+exp(x))f(x) = 1 / (1 + \exp(-x));
  • Tanh: f(x)=(exp(x)exp(x))/(exp(x)+exp(x))f(x) = (\exp(x) - \exp(-x)) / (\exp(x) + \exp(-x)).

Each of these functions transforms its input in a specific way, introducing nonlinearity and controlling the range of possible outputs.

Why is nonlinearity needed?
expand arrow

If you only stack linear transformations (matrix multiplications and additions), the result is always another linear transformation. No matter how many layers you add, the network can only model linear relationships, which are far too simple for most real-world tasks. Nonlinearity allows the network to "bend" and "reshape" data in ways that capture complex patterns.

What is the formal mathematical effect?
expand arrow

Activation functions break the strict linearity of the network's computations. By applying a nonlinear function after each linear map, you enable the network to approximate any continuous function on a compact domain — a property known as universal approximation. This is only possible because the activation function disrupts the direct proportionality of input and output, making the network fundamentally more expressive.

question mark

What is the main purpose of activation functions in a neural network?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 3

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

bookActivation Functions as Mathematical Operators

Swipe um das Menü anzuzeigen

After a neural network computes a linear transformation of its input — such as multiplying by a weight matrix and adding a bias — it applies a function called an activation function to each component of the result. Activation functions are applied pointwise: for each output of the linear map, you independently transform it using the same mathematical rule. This operation introduces nonlinearity into the network, which is crucial for modeling complex, real-world relationships that cannot be captured by linear functions alone.

Note
Definition

An activation function is a mathematical function applied to each element of a vector (or matrix) output by a linear transformation in a neural network. Common examples include:

  • ReLU (Rectified Linear Unit): f(x)=max(0,x)f(x) = \max(0, x);
  • Sigmoid: f(x)=1/(1+exp(x))f(x) = 1 / (1 + \exp(-x));
  • Tanh: f(x)=(exp(x)exp(x))/(exp(x)+exp(x))f(x) = (\exp(x) - \exp(-x)) / (\exp(x) + \exp(-x)).

Each of these functions transforms its input in a specific way, introducing nonlinearity and controlling the range of possible outputs.

Why is nonlinearity needed?
expand arrow

If you only stack linear transformations (matrix multiplications and additions), the result is always another linear transformation. No matter how many layers you add, the network can only model linear relationships, which are far too simple for most real-world tasks. Nonlinearity allows the network to "bend" and "reshape" data in ways that capture complex patterns.

What is the formal mathematical effect?
expand arrow

Activation functions break the strict linearity of the network's computations. By applying a nonlinear function after each linear map, you enable the network to approximate any continuous function on a compact domain — a property known as universal approximation. This is only possible because the activation function disrupts the direct proportionality of input and output, making the network fundamentally more expressive.

question mark

What is the main purpose of activation functions in a neural network?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 3
some-alt