Back to all projectsBack to portfolio
ARConceptResearch
Machine Learning & Research

Adversarial Robustness of ResNet-34 on ImageNet

White-box adversarial attacks with cross-architecture transferability evaluation, covering FGSM, three PGD variants, and a patch attack on 500 ImageNet images across 100 classes

Designed and implemented five adversarial attack strategies against an ImageNet-pretrained ResNet-34, FGSM, three PGD variants (targeted, random-start, momentum), and a patch-based attack, that cut top-1 accuracy from a 70.40% clean baseline to as low as 0.60% under the strongest PGD variant, then measured how much of that degradation transferred cold to DenseNet-121.

ContextResearch Project
RoleResearcher
TeamSolo
DateMay 2025

Implemented all five tasks solo: baseline evaluation, FGSM with a post-generation L∞ distance check, three PGD variants with comparative selection, the patch attack, and the full cross-architecture transferability analysis against DenseNet-121.

70.40% / 87.00% clean baseline (ResNet-34)4 adversarial test sets, 500 images eachResNet-34 top-1: 43.20% (FGSM) · 0.60% (best PGD) · 23.40% (patch)
PyTorchtorchvisionAdversarial Machine LearningFGSMPGDPatch Attacks

Overview

I designed and implemented a structured adversarial robustness evaluation against an ImageNet-pretrained ResNet-34. The study covered five attack implementations: FGSM at ε=0.02, three PGD variants sharing that same ε=0.02 budget (targeted, untargeted with random start, and momentum), and a patch-based targeted attack within a randomly placed 32×32 region using a much larger per-pixel budget (ε=0.5) confined to that small region, applied to 500 test images across 100 ImageNet classes. A clean baseline established 70.40% top-1 and 87.00% top-5 accuracy on ResNet-34, and top-1 accuracy fell to 43.20% under FGSM, as low as 0.60% under the strongest PGD variant, and 23.40% under the patch attack. I then evaluated all four generated adversarial test sets against DenseNet-121, a model that never saw the attacks during generation, to measure how much of that degradation transferred across architectures.

Problem

Small, tightly-bounded pixel perturbations can silently break deployed image classifiers. I built a structured evaluation to measure exactly how much accuracy degrades under increasing attack sophistication, and whether that degradation transfers across architectures, a stronger test of model fragility than single-model accuracy alone.

Intended User

Written for ML researchers and engineers who need to understand adversarial fragility in vision models before deployment.

Architecture

The evaluation ran through five sequential tasks in a single Colab-compatible notebook. Task 1 established the clean baseline on ResNet-34 using ImageNet normalization over 500 images from 100 classes, reaching 70.40% top-1 and 87.00% top-5 accuracy. Task 2 generated AdversarialTestSet1 via FGSM at ε=0.02, which cut ResNet-34 top-1 accuracy to 43.20%, with a post-generation L∞ distance check. Task 3 implemented three PGD variants sharing the same ε=0.02 budget: targeted PGD (40 steps, α=0.0025, a fixed incorrect target class), untargeted PGD with random start (20 steps, α=0.003), and momentum PGD (20 steps, α=0.0025, μ=1.0), reaching ResNet-34 top-1 accuracy of 0.60%, 39.60%, and 44.00% respectively. Targeted PGD was the strongest and was saved as the 500-image AdversarialTestSet2. Task 4 implemented a targeted PGD attack within a randomly placed 32×32 patch (ε=0.5, α=0.05, 40 steps, an L0 threat model), cutting ResNet-34 top-1 accuracy to 23.40% and producing AdversarialTestSet3. Task 5 evaluated all four test sets against DenseNet-121 using the same ImageNet normalization but a model architecture that was never involved in crafting the attacks.

My Contribution

I personally implemented every task in the notebook: the clean baseline evaluation, the FGSM attack with a post-generation L∞ distance check, all three PGD variants with the comparison and selection step, the 32×32 patch attack, and the full cross-architecture transferability evaluation against DenseNet-121.

Implementation

  • Implemented FGSM with ε = 0.02, cutting ResNet-34 top-1 accuracy from 70.40% to 43.20%, and ran a post-generation L∞ distance check, producing and saving all 500 adversarial images to AdversarialTestSet1.
  • Designed and compared three distinct PGD variants sharing the same ε=0.02 budget: targeted PGD (40 steps, a fixed incorrect target class, cutting top-1 to 0.60%), untargeted PGD with random start (20 steps, cutting top-1 to 39.60%), and momentum PGD (20 steps, cutting top-1 to 44.00%). Selected targeted PGD as the strongest by top-1 degradation and saved results as AdversarialTestSet2.
  • Implemented a 32×32 randomly placed patch-based targeted PGD attack (ε=0.5, α=0.05, 40 steps) operating under an L0 threat model, cutting ResNet-34 top-1 accuracy to 23.40% and producing AdversarialTestSet3.
  • Evaluated all four adversarial test sets against DenseNet-121 without any DenseNet-specific optimization, cleanly isolating the cross-architecture transferability signal.
  • Visualized adversarial examples alongside clean originals and captured correct and incorrect prediction examples at each stage of the evaluation.

Key Decisions

Four distinct threat models: FGSM, three PGD variants, and a patch attack

Why — Comparing a one-step method (FGSM), three iterative variants (PGD), and a spatially-constrained method (patch) measured robustness across increasing attack sophistication and across different perturbation norms (L∞ vs. L0).

Trade-off — Running four separate attack regimes increased evaluation time but produced interpretable, per-threat-model results rather than a single aggregate accuracy number.

Select the best PGD variant by top-1 accuracy degradation on the same test set

Why — Rather than committing to a single PGD flavor, I compared all three on identical inputs, targeted PGD cut top-1 accuracy to 0.60% against 39.60% for random-start PGD and 44.00% for momentum PGD, and I selected the strongest so AdversarialTestSet2 reflected the most effective iterative attack.

Cross-architecture transferability using DenseNet-121

Why — Evaluating attacks crafted on ResNet-34 against DenseNet-121, with no DenseNet-specific tuning, tested whether the adversarial signal generalized across model families, a stronger robustness question than single-architecture evaluation.

A post-generation distance check after FGSM

Why — Computing the distance between original and adversarial images after generation gave a concrete sanity check on the perturbation rather than assuming the implementation matched the intended ε=0.02 budget.

Testing & Validation

I validated each attack through clean and adversarial top-1 and top-5 accuracy comparisons on ResNet-34, then evaluated the generated adversarial sets on DenseNet-121 to measure cross-architecture transfer. I also recorded a post-generation L∞ distance diagnostic for the FGSM set.

Results

On ResNet-34, the model the attacks were crafted on, top-1 accuracy fell from a 70.40% clean baseline (87.00% top-5) to 43.20% under FGSM (ε=0.02), 39.60% and 44.00% under the two weaker PGD variants, 0.60% under the strongest PGD variant (the same ε=0.02 budget run for more steps), and 23.40% under the patch attack (ε=0.5 within a 32×32 region). Transferred cold to DenseNet-121, which never saw any attack during generation, top-1 accuracy dropped from a 68.20% clean baseline to 29.20% on the FGSM set, 15.60% on the strongest PGD set, and 42.40% on the patch set, confirming substantial cross-architecture transfer without any model-specific tuning.

Reliability & Failure Handling

This is a controlled research evaluation rather than a running service. The findings are scoped to the pretrained models, 500-image ImageNet sample, perturbation budgets, and attack configurations tested, and they should not be interpreted as certified robustness guarantees.

Deployment & Runtime

Notebook-based research (Colab-compatible). The three adversarial test sets and the evaluation notebook are committed to the repository for reproducibility.

Lessons Learned

  • The strongest white-box attack also transferred most effectively: targeted PGD cut ResNet-34 top-1 accuracy to 0.60% and produced the lowest DenseNet-121 transfer accuracy at 15.60% among the three threat models. The patch attack told a different story. It degraded ResNet-34 more than FGSM did (23.40% vs. 43.20% top-1) but transferred worse than FGSM to DenseNet-121 (42.40% vs. 29.20%), suggesting the localized perturbation exploited something closer to ResNet-34-specific structure rather than a feature any convolutional classifier would share.

Evidence & Technical Proof

Technologies

PyTorchtorchvisionAdversarial Machine LearningFGSMPGDPatch AttacksL∞ PerturbationL0 PerturbationWhite-box AttacksRobustness EvaluationCross-architecture TransferImageNetResNet-34DenseNet-121Python