Skip to content

A Comprehensive Guide to Activation Functions in Neural Networks

An in-depth exploration of various activation functions used in neural networks, complete with detailed explanations and Python code examples.

DEDennis Kibet Rono
4 min read

Activation functions are a fundamental component of neural networks, introducing non-linearities that enable the modeling of complex patterns. This guide provides an extensive overview of various activation functions, their characteristics, and practical Python code examples to illustrate their implementation.

Table of Contents

  1. Introduction to Activation Functions
  2. Common Activation Functions
  3. Choosing the Right Activation Function
  4. Conclusion
  5. References

Introduction to Activation Functions

In neural networks, activation functions determine the output of a neuron given an input or set of inputs. They introduce non-linearities into the model, enabling the network to capture complex patterns and relationships within the data. Without activation functions, a neural network would essentially perform linear transformations, limiting its capacity to model intricate data structures.

Common Activation Functions

1. Linear Activation Function

Definition: The linear activation function is defined as:

f(x)=xf(x) = x

Characteristics:

  • Range: (,)(-\infty, \infty)
  • Derivative: Constant (does not depend on input xx)
  • Use Cases: Rarely used in hidden layers due to the lack of non-linearity; sometimes used in output layers for regression tasks.

Python Implementation:

def linear(x):
    return x

Example Usage in PyTorch:

import torch.nn as nn
 
# Linear activation is equivalent to no activation
model = nn.Sequential(
    nn.Linear(in_features=10, out_features=5)
    # No activation function applied
)

2. Sigmoid Activation Function

Definition: The sigmoid function is defined as:

f(x)=11+exf(x) = \frac{1}{1 + e^{-x}}

Characteristics:

  • Range: (0,1)(0, 1)
  • Derivative: f(x)=f(x)(1f(x))f'(x) = f(x) \cdot (1 - f(x))
  • Use Cases: Commonly used in binary classification problems; however, it can suffer from vanishing gradient issues in deep networks.

Python Implementation:

import numpy as np
 
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

Example Usage in PyTorch:

import torch.nn as nn
 
model = nn.Sequential(
    nn.Linear(in_features=10, out_features=5),
    nn.Sigmoid()
)

3. Hyperbolic Tangent (Tanh) Activation Function

Definition: The tanh function is defined as:

f(x)=tanh(x)=exexex+exf(x) = \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}

Characteristics:

  • Range: (1,1)(-1, 1)
  • Derivative: f(x)=1f(x)2f'(x) = 1 - f(x)^2
  • Use Cases: Often preferred over the sigmoid function as it centers the data around zero, leading to faster convergence in training.

Python Implementation:

import numpy as np
 
def tanh(x):
    return np.tanh(x)

Example Usage in PyTorch:

import torch.nn as nn
 
model = nn.Sequential(
    nn.Linear(in_features=10, out_features=5),
    nn.Tanh()
)

4. Rectified Linear Unit (ReLU) Activation Function

Definition: The ReLU function is defined as:

f(x)=max(0,x)f(x) = \max(0, x)

Characteristics:

  • Range: [0,)[0, \infty)
  • Derivative: f(x)=1f'(x) = 1 for x>0x > 0; f(x)=0f'(x) = 0 for x0x \leq 0
  • Use Cases: Widely used in hidden layers of deep neural networks due to its computational efficiency and reduced likelihood of vanishing gradients.

Python Implementation:

def relu(x):
    return np.maximum(0, x)

Example Usage in PyTorch:

import torch.nn as nn
 
model = nn.Sequential(
    nn.Linear(in_features=10, out_features=5),
    nn.ReLU()
)

5. Leaky ReLU Activation Function

Definition: The Leaky ReLU function introduces a small slope for negative inputs:

f(x)={xif x>0αxif x0f(x) = \begin{cases} x & \text{if } x > 0 \\ \alpha x & \text{if } x \leq 0 \end{cases}

Where α\alpha is a small constant (e.g., 0.01).

Characteristics:

  • Range: (,)(-\infty, \infty)
  • Derivative: f(x)=1f'(x) = 1 for x>0x > 0; f(x)=αf'(x) = \alpha for x0x \leq 0
  • Use Cases: Addresses the "dying ReLU" problem by allowing a small, non-zero gradient when the unit is not active.

Python Implementation:

def leaky_relu(x, alpha=0.01):
    return np.where(x > 0, x, alpha * x)

Example Usage in PyTorch:

import torch.nn as nn
 
model = nn.Sequential(
    nn.Linear(in_features=10, out_features=5),
    nn.LeakyReLU(negative_slope=0.01)
)

Share this article

© copyright 2025