Residual Networks (ResNets)

Residual Networks (ResNets) are a type of neural network architecture that are designed to make it easier to train deep networks. The key innovation in ResNets is the introduction of “residual blocks” or “skip connections.” I’ll explain the principle and implementation with an example.

Principle of Residual Models

Problem in Deep Networks: As networks get deeper, they become harder to train due to issues like vanishing gradients, where the gradients used in training the network become increasingly smaller, making it difficult for the model to learn.
Solution – Skip Connections: Residual models introduce skip connections (also known as shortcut connections) that bypass one or more layers. These connections perform identity mapping, and their outputs are added to the outputs of stacked layers.
Ease of Training: The main idea is that these skip connections allow the model to learn an identity function, ensuring that the deeper model performs at least as well as the shallower one. This helps in mitigating the vanishing gradient problem by allowing this direct path for the gradients to flow through.
Improved Performance: These networks can be trained to be much deeper than traditional networks, leading to improved performance.

Implementation Example

Consider a simple residual block in a ResNet model:

Input: Assume the input to the residual block is x.
Layers:
- The block has two convolutional layers. Let’s call their operations Conv1 and Conv2.
- Each of these layers is followed by batch normalization (BN) and a ReLU activation function.
Skip Connection:
- Alongside these layers, there is a skip connection that directly connects the block’s input x to its output.
Output:
- The output of the block is F(x) + x, where F(x) is the output from the stacked layers (i.e., Conv2(BN(ReLU(Conv1(BN(ReLU(x))))))).
- This addition is element-wise and might require dimensionality adjustment (e.g., through a 1×1 convolution) if the dimensions of F(x) and x don’t match.
Further Activation:
- After adding F(x) and x, it might pass through another ReLU activation.

Here’s a simplified pseudo-code for a Residual Block:

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.skip = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride) if stride > 1 or in_channels != out_channels else nn.Identity()

    def forward(self, x):
        identity = self.skip(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out += identity
        out = self.relu(out)
        return out

Summary

The residual model’s principle is to facilitate training deeper networks by adding shortcut connections that skip one or more layers. The implementation typically involves adding the input (x) to the output of the convolutional layers, potentially after some dimension matching, allowing the network to learn residual functions with reference to the layer inputs.

Residual Networks (ResNets)