The source code for this blog is available on GitHub.

Abha Barge.

Defense Against PGD Adversarial Attacks

Cover Image for Defense Against PGD Adversarial Attacks

Defense Against PGD Adversarial Attacks

Tech stack :

  • Python (TensorFlow, Keras, Sklearn, PyTorch, NumPy, Matplotlib, tqdm, etc.)

Overview

This repository contains the research work conducted on Projected Gradient Descent (PGD) adversarial attacks and existing defense mechanisms in deep learning models. The focus of this work is on understanding adversarial robustness, analyzing attack behavior at the input and feature levels, and experimentally evaluating baseline defenses on image classification tasks.

The repository is intended to document research progress, experimentation, and analysis, while intentionally excluding any proprietary or novel defense methodology.


Research Objectives

  • Study adversarial attacks with a focus on PGD-based attacks
  • Understand how perturbations affect image representations at pixel and vector levels
  • Evaluate model robustness under adversarial settings
  • Analyze strengths and limitations of existing defense strategies
  • Establish experimental baselines for future defense research

Background Study

The following areas were studied in detail:

Adversarial Attacks

  • Fast Gradient Sign Method (FGSM)
  • Iterative FGSM
  • Projected Gradient Descent (PGD)
  • Relationship between PGD and constrained optimization
  • ( L_\infty ) and ( L_2 ) norm–bounded perturbations

PGD-Specific Analysis

  • Attack formulation as iterative gradient ascent on loss
  • Projection onto ε-bounded norm balls
  • Step size, number of iterations, and random initialization
  • Differences between single-step and multi-step attacks
  • Behavior of PGD on grayscale vs RGB images

Dataset and Input Representation

  • Experiments conducted on image-based datasets (e.g., MNIST / CIFAR-style data)
  • Analysis of:
    • Image tensor representation ((C \times H \times W))
    • Flattened vector representations used during optimization
    • Effect of perturbations at pixel, channel, and vector levels
  • Conversion between RGB and grayscale representations and its impact on attacks

Model Architecture

  • Convolutional Neural Network (CNN) implemented using PyTorch
  • Configurable parameters:
    • Kernel sizes
    • Pooling functions
    • Number of convolutional filters
  • Separate analysis of clean vs adversarial performance
  • Training and evaluation pipelines established for reproducibility

Experimental Setup

  • Implementation of PGD attack during evaluation
  • Controlled experiments with:
    • Fixed ε bounds
    • Varying step sizes and iterations
  • Comparison of:
    • Clean accuracy
    • Adversarial accuracy
  • Logging and visualization of results for analysis

Evaluation Metrics

  • Classification accuracy under clean inputs
  • Classification accuracy under adversarial inputs
  • Robustness degradation analysis
  • Qualitative observation of perturbed images

Explainability Analysis

  • Exploratory study of Explainable AI (XAI) techniques in adversarial contexts
  • Motivation for linking robustness and interpretability
  • Preliminary analysis of model behavior under adversarial perturbations
  • Investigation into how explanations change before and after attacks

Note: This repository documents only the exploratory and analytical aspect of XAI integration.


Key Findings (So Far)

  • PGD is significantly stronger than single-step attacks
  • Iterative perturbations exploit model gradients more effectively
  • Robustness is highly sensitive to ε and step size
  • Standard CNNs show sharp accuracy drops under PGD attacks
  • Input representation (RGB vs grayscale) influences attack behavior

What Is Intentionally Not Included

  • Novel defense mechanisms or algorithms
  • Proprietary architectural changes
  • Detailed implementation of proposed robustness improvements
  • Any unpublished or confidential methodologies

Technologies Used

  • Python
  • PyTorch
  • NumPy
  • Matplotlib
  • tqdm (for experiment tracking)

Project Status

  • Research phase: Ongoing
  • Baseline experiments: Completed
  • Defense methodology: Intentionally excluded
  • Paper drafting: In progress

Scope Limitations and Omitted Details

To preserve the integrity of ongoing research, certain aspects of this work are intentionally excluded from this repository.

These include:

  • Advanced defense strategies that go beyond standard baselines and are still under active investigation
  • Model-level adaptations and architectural refinements explored as part of robustness studies
  • Implementation-level details related to improving adversarial resistance
  • Research components that are unpublished, proprietary, or under review

The repository therefore focuses exclusively on foundational analysis, experimental evaluation, and contextual understanding, serving as a transparent record of completed research without disclosing sensitive or novel contributions.