Overview of Dense Neural Network

Embark on a journey into the world of deep learning, where technology mirrors the intricacies of the human brain. In recent years, the surge in data availability and computing power has propelled deep learning to the forefront of artificial intelligence. Its rise stems from a fundamental shift – moving beyond explicit programming to allow machines to learn from vast datasets.

This transformative paradigm has birthed powerful models like Fully Convolutional Neural Networks (FCNN). But before we delve into FCNN’s wonders, let’s unravel the essence of deep learning, understanding how it emerged as the catalyst for groundbreaking advancements in fields like computer vision.

Deep learning, a subset of machine learning, zeroes in on neural networks, aiming to construct networks capable of extracting features or patterns directly from data, eliminating the need for time-consuming, brittle, and non-scalable hand-engineered features. Instead of relying on human efforts to uncover core patterns in data, machines analyze vast datasets, extracting and unveiling essential patterns to inform decisions on new data.


A Fully Connected Neural Network (FCNN), also known as a Dense Neural Network or Multi-Layer Perceptron (MLP), represents a classic type of neural network architecture. In an FCNN, each neuron in one layer is connected to every neuron in the next layer, creating a fully connected structure. This design allows the network to learn intricate patterns and relationships in the data.


The perceptron, a neural network’s fundamental unit, is a single neuron orchestrating a transformative dance of mathematical operations. Input features (Xi) encounter weighted vectors (Wi) and a bias term (w0), culminating in a weighted sum passed through a non-linear activation function.

Sigmoid, adept at handling probability distributions, ranges from 0 to 1, while ReLU, favored for non-linear data, efficiently computes derivatives, making neural networks powerful tools for interpreting complex patterns.

In its simplest form, a perceptron constitutes a single-layer within neural networks. This layer, adorned with multiple input neurons, connects to a central processing unit perceptron. Features traverse these neurons, converging at the perceptron, where weighted sums are computed. The activation function, often a step function, decides whether the perceptron fires in response to the input.

Essentially, the perceptron transcends its singular existence, becoming the foundation of artificial neural networks. As interconnected perceptrons collectively process information, the network decodes complex patterns and generates informed predictions, ushering in a new era of artificial intelligence grounded in the timeless principles of the perceptron.

Reference: Logistic Regression


1. Input layer:

The input layer is the first layer of the network. It represents the features or variables of the input data. The number of neurons in this layer is determined by the dimensionality of the input data.

2. Hidden layers:

Hidden layers are layers between the input and output layers where the network learns to represent patterns in the data. The number of hidden layers and the number of neurons in each layer are important architectural decisions.Deep neural networks have multiple hidden layers, and the term “deep learning” is often used when referring to models with many layers.

3. Weights and Biases:

Each connection between neurons is associated with a weight, representing the strength of that connection. Additionally, each neuron has an associated bias, contributing to the overall flexibility of the model. During training, the weights and biases are adjusted to minimize the error in the network’s predictions.

4. Optimizer:

An optimizer is a crucial component in training neural networks. Its primary purpose is to minimize the error or loss function by adjusting the weights of the neural network during the training process. The optimization algorithm defines how the weights are updated based on the computed gradients of the loss with respect to the weights.

The learning rate, a hyperparameter associated with optimizers, is another crucial factor that influences the training process, and tuning it appropriately is essential for achieving good performance.The commonly used optimizers are stochastic Gradient Descent, Adam(Adaptive Moment Estimation),RMsprop(Root Mean Square propagation), Adagrad (Adaptive Gradient Algorithm.)

5. Output layer:

The output layer of a neural network is the final layer in the architecture and produces the network’s predictions or outputs based on the learned patterns and features from the input data. The structure and characteristics of the output layer depend on the nature of the task the neural network is designed to solve.

Number of Neurons:

The number of neurons in the output layer is determined by the type of task the neural network is addressing.

  • For binary classification problems, a single neuron with a sigmoid activation function is typically used. The output value represents the probability of belonging to one of the two classes.
  • For multiclass classification problems, the number of neurons corresponds to the number of classes, and a softmax activation function is often applied. The output values represent class probabilities, and the class with the highest probability is selected as the predicted class.
  • For regression tasks, where the goal is to predict a continuous value, the output layer may have a single neuron with a linear activation function.
  • The choice of activation function in the output layer depends on the task and the desired properties of the predictions.

Loss Function:

The choice of loss function in the output layer is closely related to the task. It quantifies the difference between the predicted values and the true target values during training.

In binary classification, practitioners commonly opt for the binary cross-entropy loss, while the categorical cross-entropy loss is often favored in multiclass classification scenarios. For regression tasks, mean squared error (MSE) or other specialized loss functions may be considered. Each loss function is selected based on its appropriateness to the task at hand, reflecting the nuanced nature of the modeling objectives in different contexts.

In deep neural networks, forward propagation extends beyond individual neurons to entire layers. Each layer is fully connected to the next, with the outputs of one layer becoming the inputs for the subsequent layer. Stacking these interconnected layers creates a hierarchical structure. It  allows the network to learn complex features and representations as information flows deeper. The final output is computed by traversing this progression of interconnected layers. Enabling the network to capture intricate patterns in the input data.


Backpropagation is a vital training algorithm for Fully Connected Neural Networks (FCNNs). During a forward pass, input data is processed through the network. Backpropagation then calculates the gradient of the loss, quantifying prediction errors. The chain rule is applied to efficiently compute partial derivatives for each weight and bias, guiding an optimization algorithm to iteratively update parameters.

Reference: Back Propagation

This process minimizes the difference between predicted and actual values, enabling the FCNN to learn and generalize from diverse datasets. Backpropagation’s adaptability and efficiency make it a cornerstone for training neural networks, facilitating accurate predictions and capturing intricate data patterns.


In our single-neuron neural network, computing the gradient of the loss with respect to the second weight (W2​) .It involves a step-by-step backward journey from the loss to the weights.

Starting at the output layer, we decompose this derivative into components. The first part is the derivative of the loss with respect to the output. It indicates how much a change in the output affects the loss. The second part involves the derivative of the output with respect to the weighted sum. This captures the impact of a small change in the weighted sum on the output. Lastly, the third part deals with the derivative of the weighted sum with respect to W2​. This  reveals the influence of a change in W2​ on the weighted sum.

In backpropagation, the derivative with respect to W2 is computed first, and this information is then utilized to calculate the impact on W1, applying the chain rule iteratively for a comprehensive understanding of how adjustments in weights influence the neural network’s output.

This systematic decomposition allows us to propagate these gradients through the hidden layers. unravels the impact of each weight on the loss. Ultimately, it provides valuable insights into how adjusting weights influences the overall performance of our neural network. This iterative process guides the network’s learning, enabling it to adapt and improve its predictions.


Addressing overfitting in Fully Connected Ne ural Networks (FCNNs) involves a multi-faceted approach. Regularization techniques like L1 or L2 regularization penalize large weights, promoting a simpler model that generalizes better. Dropout layers introduce randomness during training, preventing over-reliance on specific neurons.

To diversify the training dataset, employ data augmentation methods such as rotation or scaling. Early stopping monitors the model’s performance on a validation set, halting training when overfitting becomes evident. Cross-validation provides a robust evaluation, while ensemble learning combines multiple models for improved generalization.

Hyperparameter tuning fine-tunes aspects like learning rate and batch size. These strategies improve the Fully Connected Neural Network’s (FCNN) generalization beyond the training data. They address overfitting, ensuring the model performs well on new data, ultimately enhancing overall model performance.


Reference: Overfitting

Check our other blogs: CNNs

Dive deeper into the world of neural networks and enhance your expertise by enrolling in a comprehensive deep learning course. Uncover the intricacies of advanced models gaining a profound understanding of their applications in real world.