Defense Against PGD Adversarial Attacks

Tech stack :

Python (TensorFlow, Keras, Sklearn, PyTorch, NumPy, Matplotlib, tqdm, etc.)

Overview

This repository contains the research work conducted on Projected Gradient Descent (PGD) adversarial attacks and existing defense mechanisms in deep learning models. The focus of this work is on understanding adversarial robustness, analyzing attack behavior at the input and feature levels, and experimentally evaluating baseline defenses on image classification tasks.

The repository is intended to document research progress, experimentation, and analysis, while intentionally excluding any proprietary or novel defense methodology.

Research Objectives

Study adversarial attacks with a focus on PGD-based attacks
Understand how perturbations affect image representations at pixel and vector levels
Evaluate model robustness under adversarial settings
Analyze strengths and limitations of existing defense strategies
Establish experimental baselines for future defense research

Background Study

The following areas were studied in detail:

Adversarial Attacks

Fast Gradient Sign Method (FGSM)
Iterative FGSM
Projected Gradient Descent (PGD)
Relationship between PGD and constrained optimization
( L_\infty ) and ( L_2 ) norm–bounded perturbations

PGD-Specific Analysis

Attack formulation as iterative gradient ascent on loss
Projection onto ε-bounded norm balls
Step size, number of iterations, and random initialization
Differences between single-step and multi-step attacks
Behavior of PGD on grayscale vs RGB images

Dataset and Input Representation

Experiments conducted on image-based datasets (e.g., MNIST / CIFAR-style data)
Analysis of:
- Image tensor representation ((C \times H \times W))
- Flattened vector representations used during optimization
- Effect of perturbations at pixel, channel, and vector levels
Conversion between RGB and grayscale representations and its impact on attacks

Model Architecture

Convolutional Neural Network (CNN) implemented using PyTorch
Configurable parameters:
- Kernel sizes
- Pooling functions
- Number of convolutional filters
Separate analysis of clean vs adversarial performance
Training and evaluation pipelines established for reproducibility

Experimental Setup

Implementation of PGD attack during evaluation
Controlled experiments with:
- Fixed ε bounds
- Varying step sizes and iterations
Comparison of:
- Clean accuracy
- Adversarial accuracy
Logging and visualization of results for analysis

Evaluation Metrics

Classification accuracy under clean inputs
Classification accuracy under adversarial inputs
Robustness degradation analysis
Qualitative observation of perturbed images

Explainability Analysis

Exploratory study of Explainable AI (XAI) techniques in adversarial contexts
Motivation for linking robustness and interpretability
Preliminary analysis of model behavior under adversarial perturbations
Investigation into how explanations change before and after attacks

Note: This repository documents only the exploratory and analytical aspect of XAI integration.

Key Findings (So Far)

PGD is significantly stronger than single-step attacks
Iterative perturbations exploit model gradients more effectively
Robustness is highly sensitive to ε and step size
Standard CNNs show sharp accuracy drops under PGD attacks
Input representation (RGB vs grayscale) influences attack behavior

What Is Intentionally Not Included

Novel defense mechanisms or algorithms
Proprietary architectural changes
Detailed implementation of proposed robustness improvements
Any unpublished or confidential methodologies

Technologies Used

Python
PyTorch
NumPy
Matplotlib
tqdm (for experiment tracking)

Project Status

Research phase: Ongoing
Baseline experiments: Completed
Defense methodology: Intentionally excluded
Paper drafting: In progress

Scope Limitations and Omitted Details

To preserve the integrity of ongoing research, certain aspects of this work are intentionally excluded from this repository.

These include:

Advanced defense strategies that go beyond standard baselines and are still under active investigation
Model-level adaptations and architectural refinements explored as part of robustness studies
Implementation-level details related to improving adversarial resistance
Research components that are unpublished, proprietary, or under review

The repository therefore focuses exclusively on foundational analysis, experimental evaluation, and contextual understanding, serving as a transparent record of completed research without disclosing sensitive or novel contributions.

Abha Barge.

Defense Against PGD Adversarial Attacks

Defense Against PGD Adversarial Attacks

Tech stack :

Overview

Research Objectives

Background Study

Adversarial Attacks

PGD-Specific Analysis

Dataset and Input Representation

Model Architecture

Experimental Setup

Evaluation Metrics

Explainability Analysis

Key Findings (So Far)

What Is Intentionally Not Included

Technologies Used

Project Status

Scope Limitations and Omitted Details