Defense Against PGD Adversarial Attacks

Defense Against PGD Adversarial Attacks
Tech stack :
- Python
- TensorFlow
- Keras
- Sklearn
- PyTorch
- NumPy
- Matplotlib
- tqdm (for experiment tracking)
- More
Overview
This repository contains the research work conducted on Projected Gradient Descent (PGD) adversarial attacks and existing defense mechanisms in deep learning models. The focus of this work is on understanding adversarial robustness, analyzing attack behavior at the input and feature levels, and experimentally evaluating baseline defenses on image classification tasks.
The repository is intended to document research progress, experimentation, and analysis, while intentionally excluding any proprietary or novel defense methodology.
Research Objectives
- Study adversarial attacks with a focus on PGD-based attacks
- Understand how perturbations affect image representations at pixel and vector levels
- Evaluate model robustness under adversarial settings
- Analyze strengths and limitations of existing defense strategies
- Establish experimental baselines for future defense research
Background Study
The following areas were studied in detail:
Adversarial Attacks
- Fast Gradient Sign Method (FGSM)
- Iterative FGSM
- Projected Gradient Descent (PGD)
- Relationship between PGD and constrained optimization
- ( L_\infty ) and ( L_2 ) norm–bounded perturbations
PGD-Specific Analysis
- Attack formulation as iterative gradient ascent on loss
- Projection onto ε-bounded norm balls
- Step size, number of iterations, and random initialization
- Differences between single-step and multi-step attacks
- Behavior of PGD on grayscale vs RGB images
Dataset and Input Representation
- Experiments conducted on image-based datasets (e.g., MNIST / CIFAR-style data)
- Analysis of:
- Image tensor representation ((C \times H \times W))
- Flattened vector representations used during optimization
- Effect of perturbations at pixel, channel, and vector levels
- Conversion between RGB and grayscale representations and its impact on attacks
Model Architecture
- Convolutional Neural Network (CNN) implemented using PyTorch
- Configurable parameters:
- Kernel sizes
- Pooling functions
- Number of convolutional filters
- Separate analysis of clean vs adversarial performance
- Training and evaluation pipelines established for reproducibility
Experimental Setup
- Implementation of PGD attack during evaluation
- Controlled experiments with:
- Fixed ε bounds
- Varying step sizes and iterations
- Comparison of:
- Clean accuracy
- Adversarial accuracy
- Logging and visualization of results for analysis
Evaluation Metrics
- Classification accuracy under clean inputs
- Classification accuracy under adversarial inputs
- Robustness degradation analysis
- Qualitative observation of perturbed images
Explainability Analysis
- Exploratory study of Explainable AI (XAI) techniques in adversarial contexts
- Motivation for linking robustness and interpretability
- Preliminary analysis of model behavior under adversarial perturbations
- Investigation into how explanations change before and after attacks
Note: This repository documents only the exploratory and analytical aspect of XAI integration.
Key Findings (So Far)
- PGD is significantly stronger than single-step attacks
- Iterative perturbations exploit model gradients more effectively
- Robustness is highly sensitive to ε and step size
- Standard CNNs show sharp accuracy drops under PGD attacks
- Input representation (RGB vs grayscale) influences attack behavior
What Is Intentionally Not Included
- Novel defense mechanisms or algorithms
- Proprietary architectural changes
- Detailed implementation of proposed robustness improvements
- Any unpublished or confidential methodologies
Technologies Used
- Python
- PyTorch
- NumPy
- Matplotlib
- tqdm (for experiment tracking)
Project Status
- Research phase: Ongoing
- Baseline experiments: Completed
- Defense methodology: Intentionally excluded
- Paper drafting: In progress
Scope Limitations and Omitted Details
To preserve the integrity of ongoing research, certain aspects of this work are intentionally excluded from this repository.
These include:
- Advanced defense strategies that go beyond standard baselines and are still under active investigation
- Model-level adaptations and architectural refinements explored as part of robustness studies
- Implementation-level details related to improving adversarial resistance
- Research components that are unpublished, proprietary, or under review
The repository therefore focuses exclusively on foundational analysis, experimental evaluation, and contextual understanding, serving as a transparent record of completed research without disclosing sensitive or novel contributions.