SANATSU.BLOG
June 15, 20262 min read
aisecurity

Securing AI Models in Production

From adversarial attacks to model theft, here's what you need to know to protect your ML models once they leave the lab.

Key takeaways

  • AI models face unique threats: adversarial inputs, model inversion, and extraction attacks
  • Input sanitization and rate limiting are table stakes — not sufficient on their own
  • Differential privacy and on-device inference are emerging as best practices

The New Attack Surface

AI models expose a fundamentally different attack surface than traditional software. Instead of SQL injection or buffer overflows, you're defending against attackers who manipulate probabilistic systems — often with surprising effectiveness.

Adversarial Attacks

Small, imperceptible perturbations to input data can cause models to misclassify with high confidence:

python
import torch
import torch.nn.functional as F
 
def fgsm_attack(model, image, label, epsilon=0.03):
    image.requires_grad = True
    output = model(image)
    loss = F.nll_loss(output, label)
    model.zero_grad()
    loss.backward()
    # Perturb in the direction of the gradient
    perturbed = image + epsilon * image.grad.sign()
    return torch.clamp(perturbed, 0, 1)

A stop sign perturbed with epsilon=0.03 looks identical to a human but classifies as a speed limit sign.

Model Extraction

Attackers can reconstruct your model by querying it repeatedly. A 2024 study extracted a GPT-class model with fewer than 10 million queries:

TechniqueQueries NeededAccuracy vs Original
Random sampling50M+~60%
Active learning8M~85%
Knockoff nets3M~88%

Defense: Rate-limit API calls, add noise to logits, monitor query patterns.

Model Inversion

Given a model and some labels, attackers can reconstruct training data — including private information like faces or medical records.

python
# Simplified inversion attack
def reconstruct(model, target_label, steps=1000):
    x = torch.randn(1, 3, 224, 224, requires_grad=True)
    optimizer = torch.optim.Adam([x], lr=0.01)
    for _ in range(steps):
        optimizer.zero_grad()
        output = model(x)
        loss = -output[0, target_label]  # Maximize target class
        loss.backward()
        optimizer.step()
    return x.detach()

Defense in Depth

At the API Layer

  • Rate limiting and anomaly detection (spike in predict calls)
  • Input validation — reject out-of-distribution samples
  • Authentication and usage quotas per API key

At the Model Layer

  • Differential privacy during training (DP-SGD)
  • Output sanitization — clamp logits, round probabilities
  • Ensemble diversity — use different models for different risk levels

At the Infrastructure Layer

  • Secure enclaves (AWS Nitro, Azure Confidential Computing) for model weights
  • On-device inference — keep sensitive data off the wire
  • Model watermarking to detect stolen copies

The Verdict

AI security isn't a checkbox — it's a continuous process. The same way we learned to sanitize SQL inputs in the 2000s, we need to learn how to secure model endpoints in the 2020s. Start with rate limiting and input validation, then layer in differential privacy and confidential computing as your threat model demands.

stay in the loop

get notified when new articles drop. no spam, ever.