detflow

the failure mode in AI detection tooling

One big LLM call and a prayer.

The junior move is to make the whole pipeline a single model call — draft, validate, dedupe and review all in one prompt. Then the model is slow, rate-limited, or simply absent in CI, and the entire stage falls over with it.

CI · a rule lands in a merge request

lint it, check it against the catalog, review it for FP risk — gate the merge on the result

monolithic "AI reviewer"

Calling the model to lint + dedupe + review in one shot… 503 model unavailable ✗

⚠ The schema check and the catalog dedup never needed a model at all — but they were welded to the LLM call, so a transient outage just failed a merge that was perfectly valid. The AI took the boring, testable parts down with it.

The fix isn't a more reliable model. It's structural: the deterministic parts run without a model, the AI is an optional enhancement, and every verb degrades instead of breaking.

it goes both ways

Intel into rules. Rules into a merge.

A new advisory drops and someone has to translate it into detection content; or an analyst describes a behavior and someone has to turn it into a reviewed rule. detflow does both halves of the detection-as-code loop — the only vendor-specific parts (where intel comes from, where rules deploy) are exactly what it leaves to you.

intel → detection · analyze()

From a threat report to a package

threat report→ ATT&CK + confidence→ Sigma · YARA · Suricata→ STIX · Navigator · brief

Paste a CVE advisory or a CTI writeup; get back tactic-ordered techniques with evidence, generated rules with the Sigma linted in place, and one-call exports to STIX 2.1, an ATT&CK Navigator layer, or an audience-aware Markdown brief.

detection → merge · draft / lint / review

From plain English to a reviewed rule

plain English→ Sigma / XQL→ lint + dedupe→ quality · FP risk · verdict

Describe the behavior; get a first-draft rule, linted offline, deduped against the catalog you already run, and reviewed like a senior engineer — quality score, false-positive risk, ATT&CK mapping and the gaps worth flagging.

deterministic primitives first, the model as an enhancement

Each verb is independently useful.

Every piece runs on its own, and the boring ones run with no model, no network and no keys — so they drop straight into CI.

analyze

report → detection package

needs a model

Map a raw report to tactic-ordered ATT&CK techniques with confidence, generate Sigma / YARA / Suricata, and treat the report as hostile input — the prompt is hardened against injection and never invents technique IDs.

draft

English → Sigma / XQL

needs a model

Turn a sentence into a first-draft rule in either language. The XQL prompt knows XQL — no startswith, no SQL where — so you don't get SQL-shaped hallucinations back.

lint

schema + best practice

no model · pure

Validate a rule offline — stdlib plus PyYAML — and get pass / warn / fail with per-finding messages. Runs in CI with zero secrets.

find_overlaps

catalog dedup

no model · pure

Check a rule against the inventory you already run, so you don't ship coverage you already have. Deterministic, no network.

review

senior-engineer pass

model optional · floor

Quality score, false-positive risk, ATT&CK mapping and verdict when a model is present — and a deterministic floor (lint + overlaps + parsed techniques) when it isn't. Never raises.

exports

STIX · Navigator · brief

no model · pure

Each is one pure function of the analysis with deterministic IDs — a STIX 2.1 bundle for the TIP, an ATT&CK Navigator layer for the coverage map, and a Markdown brief framed for the reader you pass.

The brief is the one piece that knows its audience: pass audience= and the same analysis is framed for dr, soc, leadership, purple_team, red_team or general — one analysis, six lenses. The mappings and rules don't change; only the framing does.

model-agnostic on purpose

Bring any model. It's one method.

detflow imports no SDK and hard-codes no provider. A "model" is anything with a single complete() method — so an OpenAI-compatible endpoint, a local vLLM/Ollama server, or a LangChain failover chain all drop straight in.

quickstart.py

import detflow

# both directions, one import ----------------------------------
sigma = detflow.draft("encoded PowerShell spawned from a Word macro")
a     = detflow.analyze(advisory_text, audience="dr")   # report → package

# the deterministic verbs need no model, no network, no keys
report   = detflow.lint(sigma.rule)               # pass / warn / fail
overlaps = detflow.find_overlaps(sigma.rule, catalog)
detflow.to_stix_bundle(a)                          # pure fn → your TIP

# review uses a model when present, a floor when it isn't
result = detflow.review(sigma.rule, catalog=catalog)
print(result.quality_score, result.false_positive_risk, result.verdict)

# bring any model — here, a langchain-failover chain
from langchain_failover import FailoverChatModel
from detflow.llm import LangChainModel
model = LangChainModel(FailoverChatModel(models=[primary, local_fallback]))
detflow.analyze(advisory_text, model=model)        # rides the failover chain

pip install detflow complete(system, user, *, json) → str OpenAI-compatible via DETFLOW_LLM_* langchain-failover chain Python 3.9+ · ships py.typed core is stdlib + PyYAML

where it came from

Carved out of a real detection-as-code workbench.

detflow is the detection-side sibling of iocflow — iocflow handles the indicator lifecycle; detflow handles the rule lifecycle. Same design DNA.

I kept re-implementing the same generic stages inside every pipeline — lint a rule, draft one from English, dedupe against the catalog, review it, and translate an advisory into detection content. The only vendor-specific parts are compiling to your query language and dry-running against your tenant — so I left exactly those out, scrubbed the generic middle, and shipped it. It even eats its own dog food: the failover model is langchain-failover, another package extracted from the same work. Deterministic primitives plus optional AI — the boring parts are boring and tested, the model adds judgment where judgment helps, and nothing falls over when it's slow or absent.

One big LLM call and a prayer.

Pull the model. Watch what still works.

Intel into rules. Rules into a merge.

From a threat report to a package

From plain English to a reviewed rule

Each verb is independently useful.

report → detection package

English → Sigma / XQL

schema + best practice

catalog dedup

senior-engineer pass

STIX · Navigator · brief

Bring any model. It's one method.

Carved out of a real detection-as-code workbench.

pip install detflow