LLM security: the threats nobody warned you about

Security thinking for LLM apps got stuck on prompt injection because that’s the dramatic one. The dramatic ones are usually not the dangerous ones. The threats that actually compromise production LLM apps are quieter and don’t come with a clever name: data exfiltration through tool calls, secret leakage in trace logs, model-as-confused-deputy, and the slow-burn of fine-tuned models trained on poisoned data.

The threats worth a threat model

If your agent has tools that can read user data and tools that can send messages externally, an adversarial input can chain them — read the secret, exfil to an attacker-controlled endpoint — without ever needing to “jailbreak” the model. If your traces include full prompts, your trace store has copies of every credential anyone has ever pasted into a chat. If your application accepts user input, embeds it, and uses the embedding for retrieval, an attacker can poison the embedding space to bias retrievals long after the input is gone.

Defenses that actually fit production

Tool-level allowlists for outbound network access, regardless of what the model says. PII scrubbing at the trace ingestion boundary, not at display time — once it’s in the trace store, it’s been written somewhere you can’t fully control. For RAG corpora ingested from user input, treat the corpus as untrusted and the retrieval results as user-influenced data, not as ground truth.

The interesting LLM security threats are not in the prompts. They are in the architecture around the prompts. The earlier you accept that, the less work you’ll redo.

LLM security: the threats nobody warned you about

The threats worth a threat model

Defenses that actually fit production

Tags :

Share :

Related Posts

Tracing LLM apps: what to log when nothing crashes

Retry, backoff, and the ghosts in your latency graph