Using LogicPearl as an AI Guardrail

A deep dive into PII Shield, a local AI guardrail that uses LogicPearl to block or redact sensitive prompts before they leave your machine.

Ken Erwin Founder, LogicPearl

AI tools need guardrails, but the guardrail itself has to be something you can trust.

If a prompt contains a patient SSN, a credit card number, or a medical record number, you do not want that text leaving the machine. If the same pattern appears inside test code, a fixture, or a harmless example, you do not want to block the developer every five minutes. The hard part is not finding patterns. The hard part is making the decision around those patterns consistent, inspectable, and easy to improve.

That is the pattern behind PII Shield: a local Claude Code hook that uses LogicPearl as the policy engine for prompt safety.

The shape of the guardrail

PII Shield runs before a prompt reaches the model API. It has two stages:

prompt
  -> observer
  -> pearl
  -> ignore | redact | block

The observer is a small Python program. It does pattern matching, checks nearby context words, and emits typed features. It can say things like:

{
  "pattern_ssn_formatted": true,
  "pattern_credit_card": false,
  "context_code": false,
  "context_medical": false,
  "pii_pattern_count": 1,
  "pattern_density": 0.4,
  "code_context_ratio": 0
}

The observer does not decide whether to block. That distinction matters. Pattern extraction is allowed to be simple and mechanical. Policy lives in the pearl.

The pearl looks at the combination of features and chooses one action:

  • ignore: let the prompt through
  • redact: stop the prompt and show a redacted version the user can resubmit
  • block: stop the prompt because it looks like a sensitive data dump

In the current PII Shield example, the pearl has 9 learned rules built from 112 labeled action traces. The hook runs locally, logs the evaluation, and never needs to send the prompt somewhere else just to decide whether the prompt is safe.

Why regex alone is not enough

The naive guardrail is one line:

if ssn_pattern.match(text):
    block()

That works until the real cases show up.

Check patient SSN 456-78-9012 should be blocked or redacted.

const TEST_SSN = "123-45-6789" should usually pass. That is probably a fixture, not a secret from a production system.

1234567890 by itself is just a number.

Patient MRN 1234567890 is probably medical data.

Five different sensitive-looking patterns in one prompt should be treated differently from one ambiguous pattern.

You can encode all of that with nested if statements, but now the guardrail is a hand-built decision tree. Every exception becomes another branch. Every branch interacts with the other branches. Nobody can review the whole policy without reading code and mentally executing edge cases.

LogicPearl changes the maintenance model. You label examples:

pattern_ssn_formatted,context_code,pii_pattern_count,decision
true,false,1,redact
true,true,1,ignore

Then LogicPearl compiles the examples into a policy artifact. The result is still deterministic. It is still inspectable. But the policy is learned from cases instead of hand-authored as branching code.

What the PII Shield pearl learned

The generated policy is intentionally small. Its default action is ignore, and higher-priority rules escalate to redact or block.

The first rule is the blunt instrument:

block when total pattern matches >= 5

That catches prompts that look like data dumps.

The rest of the policy handles more precise cases:

redact when credit card pattern is present and the prompt is not code-like
redact when IBAN pattern is present and the prompt is not code-like
redact when formatted SSN is present with no code context
redact when a long number appears near financial keywords
redact when identity keywords appear near sparse sensitive patterns
redact when a bare 9-digit number appears near SSN keywords
redact when an EIN appears near tax keywords
redact when medical keywords are nearby

The important part is not that these nine rules are perfect forever. The important part is that they are visible. A reviewer can inspect the pearl, see the features, see the thresholds, see the action priority, and ask whether the learned behavior matches the intended policy.

That is the difference between “we have some regexes in a hook” and “we have a guardrail policy artifact.”

The observer stays boring

PII Shield’s observer extracts 20 features. Some are direct pattern flags:

  • formatted SSN
  • bare 9-digit number
  • Luhn-valid credit card number
  • US phone number
  • IBAN
  • EIN
  • date
  • ICD-10 or CPT-like medical code

Some are nearby context flags:

  • SSN keywords
  • medical keywords
  • financial keywords
  • identity keywords
  • tax keywords
  • programming keywords

And a few are aggregate signals:

  • total pattern count
  • patterns per 100 characters
  • fraction of patterns near code keywords

This division is the key design choice. The observer should answer factual questions about the prompt. The pearl should answer the policy question.

That keeps the system debuggable. If the guardrail misses something, you can ask two separate questions:

  1. Did the observer expose the right signal?
  2. Did the pearl make the right decision from those signals?

Those are much easier to fix than a tangled block of prompt-scanning code.

What happens at runtime

The Claude Code hook reads the submitted prompt, runs the observer, and fast-paths clean prompts. If no patterns match, it returns {} and the prompt continues.

If the observer finds candidate sensitive data, the hook removes the redaction candidates from the feature map and runs:

logicpearl run gate/pearl.ir.json - --json

If the pearl returns ignore, the prompt continues.

If it returns redact, the hook blocks the original prompt and gives the user a redacted version. The hook replaces sensitive spans with tags like:

[PII:SSN]
[PII:CREDIT_CARD]
[FIN:IBAN]
[PHI:DOB]

If it returns block, the hook stops the prompt outright and tells the user there are too many sensitive patterns to safely resubmit as-is.

The hook also appends a local JSONL evaluation log with the action, matched rules, timestamp, and pattern count. That gives you a feedback trail without storing the original sensitive prompt text.

Improving the guardrail

The most useful part of this pattern is the update loop.

When the guardrail makes a mistake, you do not start by editing branching logic. You add an example.

The PII Shield workflow is:

cat traces/synthetic.csv > traces/combined.csv
tail -n +2 traces/realistic.csv >> traces/combined.csv
tail -n +2 traces/feedback.csv >> traces/combined.csv

logicpearl build traces/combined.csv \
  --action-column decision \
  --default-action ignore \
  --feature-dictionary gate/feature_dictionary.json \
  --feature-governance governance/feature_governance.json \
  --action-priority block,redact \
  --output-dir gate/

Then inspect what changed:

logicpearl inspect gate/pearl.ir.json

The pearl might add a rule, tighten a threshold, or restructure the existing policy. The point is that the review surface is the artifact diff, not a guessing game over which if statement should move.

For AI safety work, that matters. Guardrails get political fast because they encode risk tolerance. You need a policy that can be reviewed by security, compliance, product, and engineering without asking everyone to reverse-engineer a pile of conditionals.

Why this works well for AI guardrails

AI guardrails are usually treated as prompts, regex filters, classifiers, or hard-coded policy engines. Each has a failure mode.

Prompts are flexible, but they are not deterministic.

Regexes are deterministic, but they do not express policy well once context matters.

Classifiers can generalize, but they can be hard to audit and reproduce.

Hand-coded policy engines are deterministic, but they often become the same unreviewable branching logic they were meant to replace.

LogicPearl sits in a narrower lane: deterministic policy learned from labeled examples.

That is a good fit when the guardrail decision should be:

  • local
  • fast
  • reproducible
  • auditable
  • improved from examples
  • deployable as an artifact

PII Shield is one example, but the same shape works beyond PII. Tool-call approval, data access checks, prompt-injection boundaries, agent action routing, and regulated workflow gates all have the same core problem: the input is messy, but the final decision has to be explainable.

The pattern to copy

The reusable architecture is simple:

observer extracts typed signals
examples label the intended action
LogicPearl compiles the action policy
runtime runs the pearl before the model call
logs capture what the pearl saw
feedback examples improve the next version

That gives you a guardrail that behaves more like infrastructure than advice. It can be tested. It can be diffed. It can be inspected before deployment. It can run locally before sensitive data leaves the machine.

That is what we want AI guardrails to become: not a best-effort warning in a prompt, but a deterministic artifact sitting directly in the path of the decision.