Вивчайте Limitations of Shallow Networks | Approximation and Representational Power

Свайпніть щоб показати меню

To understand the limitations of shallow neural networks, you first need to know what a shallow network is and how the Universal Approximation Theorem relates to their abilities. A shallow neural network is a feedforward network with a single hidden layer between its input and output layers. The Universal Approximation Theorem states that such a network, given enough hidden units and the right activation function, can approximate any continuous function on a compact subset of $\mathbb{R}^n$ as closely as desired.

However, the theorem does not guarantee that this approximation is practical or efficient. The result is existential: it says a shallow network can approximate any function, but not how many hidden units or how much computation is required. This distinction is crucial for real-world applications, where computational resources and training data are limited.

Intuitive Limitations

Shallow networks may require an extremely large number of hidden units to represent complex functions;
As the complexity of the target function increases, the number of parameters and connections can grow rapidly, leading to inefficiency and a risk of overfitting;
Training very wide shallow networks can be computationally expensive and may suffer from optimization challenges, such as vanishing gradients or poor local minima.

Formal Constraints

For certain functions, the number of hidden units required to achieve a given approximation accuracy grows exponentially with the input dimension or the complexity of the function;
Some function classes cannot be efficiently represented by shallow networks, meaning you would need an impractically large network width to match the expressivity of deeper architectures;
There are mathematical results showing that specific compositional or hierarchical functions can be represented with far fewer parameters in deep networks than in shallow ones.

These limitations help explain why deeper neural networks are so influential in modern machine learning. While shallow networks are theoretically powerful, their practical use is often hampered by inefficiency and the need for an enormous number of parameters to approximate complex functions. Deeper architectures, by stacking multiple layers, can capture hierarchical patterns and compositional structure more efficiently. This insight builds on the Universal Approximation Theorem, showing that depth, not just width, is crucial for the practical expressivity and scalability of neural networks.

Все було зрозуміло?

Дякуємо за ваш відгук!

Секція 2. Розділ 2

Запитати АІ

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Секція 2. Розділ 2