piighost

Anonymize PII before it reaches the LLM

piighost is a Python library for building PII anonymization pipelines. It swaps personal data for stable placeholders the model can reason about, then restores the real values for your tools and your users. Your agent code does not change.

Hi, this is Patrick Dupont from Acme Corp. My order #ACME-9123 should be delivered to 12 rue de la Paix, Paris. You can reach me by email at patrick.dupont@acme.com or by phone at +33 6 12 34 56 78.

The problem

You should not have to choose between good models and data privacy

Hosted clouds leak raw data

OpenAI, Anthropic, and Google ship the best models on the market. But every byte of context you send them, including raw user PII, leaves your jurisdiction the moment the request hits the wire. A single prompt becomes a data export.

Local models trade quality

Self-hosting keeps the data inside your network, but you give up part of the state of the art and you take on the GPU bill and the patching. The privacy gain comes with a permanent operational cost, and the model you can run is rarely the model you wish you were running.

Compliance does not wait

GDPR, HIPAA, and data-residency rules apply whether or not your stack was built with them in mind. Sending raw PII to a third party is a liability you cannot undo once a request has left, and it forces every later product decision through a legal review.

Bans throw away the upside

Some teams respond by banning hosted LLMs outright. That protects the data, but it also forfeits the productivity gains everyone else is capturing, and people route around the ban anyway by pasting work into personal accounts the company cannot see.

How it works

A layer between your agent and the model

User message

Hi, this is Patrick Dupont. Could you forward this to Marie Lambert and Jean Moreau? My email is patrick.dupont@acme.com, and you can also cc marie.lambert@acme.com. The case ID is #ACME-9123.

piighost runs your detectors over the message and reports every PII span it finds: names, emails, identifiers, anything the model does not need to see. Overlapping detections from multiple detectors are arbitrated by confidence before anything is replaced.

Use case driven

Each use case calls for its own pipeline

There is no universal detector for PII. piighost gives you composable building blocks (detection, linking, output guardrails) so you can build a pipeline tuned to your data, your latency budget, and your compliance rules.

Conversational

Customer support, in-app chat, voice transcripts. Fast NER for names and locations, regex for emails and phone numbers, thread-scoped memory so the same person keeps the same placeholder across the whole conversation.

Document processing

Long PDFs, contracts, support tickets. Latency budget is wider, accuracy matters more. An LLM as a detector on the tricky paragraphs, regex on the structured fields, and re-anchoring so findings line up with the source document.

Structured forms

API payloads, CSVs, exports. Sub-millisecond, deterministic, auditable. A pure regex pipeline with an exhaustive ruleset, no model in the loop, and a placeholder format your downstream systems can parse.

Code and logs

Debugging assistants, log triage, incident bots. Stack traces and log lines carry secrets, tokens, internal hostnames, and user records. A regex-first pipeline strips credentials and identifiers, with custom detectors for your own ID formats, before anything reaches the model.

The ecosystem

One privacy layer, many projects

Start with the library. Reach for the server, the chat demo, and the proofreader as you grow.

Quick start

Drop it into a LangChain agent

Add the middleware and your agent code stays the same.

uv add 'piighost[cache]'
from langchain.agents import create_agent

from piighost import Anonymizer, ExactMatchDetector
from piighost.pipeline import ThreadAnonymizationPipeline
from piighost.middleware import PIIAnonymizationMiddleware

# Wire any detector you like: regex, a NER model, or an LLM.
detector = ExactMatchDetector([("Patrick", "PERSON")])
pipeline = ThreadAnonymizationPipeline(detector=detector, anonymizer=Anonymizer())
middleware = PIIAnonymizationMiddleware(pipeline=pipeline)

agent = create_agent(
    model="openai:gpt-5.5",
    tools=[send_email],
    middleware=[middleware],
)

# The LLM only sees "<<PERSON:1>>".
# Your send_email tool still receives the real value.

Ship AI features without shipping user data

Install piighost, wire your detector, and keep PII out of the model.