Keyword Watchdog
Automatically monitor voice transcripts for keywords, phrases, or patterns — with built-in typo tolerance, obfuscation stripping, and AI-powered semantic detection.
Overview
The Keyword Watchdog provides continuous, automated scanning of every voice transcript in your workspace. Unlike AI Insights — which you run on-demand — the Watchdog runs silently in the background, checking each new transcript against your configured rules the moment it's saved.
Matching operates in four tiers, applied in order:
- Exact match — Fast, case-insensitive substring or regex check.
- Normalised (deobfuscation) — Strips leet speak and obfuscation characters before matching, so
fr@ud, frau(d), and f.r.a.u.d all collapse to fraud. - Fuzzy (trigram) — Catches typos and misspellings by comparing text similarity. "fraus" will fuzzy-match "fraud".
- Semantic (AI embedding) — Compares the meaning of the transcript to the rule's topic, detecting discussion about the rule's subject even when no keyword appears.
Tier 1 and 2 are always active for keyword/phrase rules. Tier 3 (fuzzy) is on by default and can be toggled per rule. Tier 4 (semantic) is opt-in and requires an OpenAI API key.
When a rule matches, the system creates an alert with full context: the matched text, the match method, the channel, the speaker, and the severity level. Alerts are collected on the Watchdog dashboard where administrators can review, acknowledge, and dismiss them.
How Rules Work
A watchdog rule defines what to look for and where to look. Each rule has:
- Name — A human-readable label for the rule (e.g. "Profanity filter", "Safety incident keywords").
- Patterns — One or more search terms. The system checks each transcript against every pattern in the rule.
- Match Type — How the patterns are interpreted (see below).
- Severity — The alert level when this rule triggers: Low, Medium, High, or Critical.
- Channel Scope — Optionally restrict monitoring to specific channels. Leave blank to monitor all channels.
- Enabled / Disabled — Toggle a rule on or off without deleting it.
Match Types
Each rule uses one of three match types, which determines how your patterns are compared against transcript text:
- Keyword — Case-insensitive substring match. The pattern "fire" will match "There's a fire in building 3" and "Please fire up the generator". Use this for broad monitoring.
- Phrase — Case-insensitive exact phrase match. The pattern "fire in building" will only match when those exact words appear in sequence. Use this to reduce false positives.
- Regex — Full regular expression support. For example,
\b(fire|smoke|evacuate)\b matches any of those words as whole words. Use this for complex pattern matching when keyword or phrase mode isn't precise enough.
You can add multiple patterns to a single rule. The rule triggers if any pattern matches the transcript.
Text Normalization (Deobfuscation)
For keyword and phrase rules, the Watchdog automatically normalises both the transcript and every pattern before comparing. This catches intentional obfuscation and leet-speak substitutions that would defeat a simple substring check.
Normalization performs three steps:
- Leet-speak substitution — Common character swaps are reversed:
@ → a, $ → s, ! → i, 0 → o, 3 → e, 4 → a, 5 → s, and more. - Punctuation stripping — All non-letter characters inside words are removed. Parentheses, dots, dashes, brackets, and remaining digits are stripped.
- Space collapse — Multiple spaces are collapsed into one and the result is trimmed.
Normalization is always active for keyword and phrase rules and cannot be disabled. It adds negligible overhead because it uses PostgreSQL's native translate() and regexp_replace() functions.
Fuzzy Matching
Fuzzy matching uses PostgreSQL's pg_trgm (trigram) extension to compare the similarity of words rather than requiring exact character-for-character matches. This catches:
- Typos — "fraus", "frdau", "fraaud" all show high similarity to "fraud".
- Misspellings — "harrasment", "harasment", "harassement" all match "harassment".
- Phonetic approximations — Words that sound similar but are spelled differently.
Fuzzy matching is enabled by default for all keyword/phrase rules and can be toggled off per rule via the Matching Strategy section in the rule dialog. A sensitivity slider lets you control how similar the words must be.
Fuzzy Sensitivity
The threshold ranges from 0.20 (very permissive — many near-matches) to 0.80 (very strict — almost exact). The default is 0.40, which provides good coverage for common typos without excessive false positives.
- Lower threshold (0.20–0.35) — Catches more variations but may produce false positives on short words.
- Default (0.40) — Balanced — catches most single- and double-character typos.
- Higher threshold (0.50–0.80) — Only matches very close variations. Use when the base keyword is short or commonly appears in unrelated words.
Fuzzy matching is only applied to patterns with 3 or more characters to avoid noise from very short keywords.
Semantic Matching
Semantic matching goes beyond keywords to detect topic-level discussions related to your rule — even when no specific keyword appears. It uses AI embeddings to compare the meaning of a transcript to the rule's topic.
How it works
- When you save a rule with semantic matching enabled, the system generates a topic embedding from the rule's name, description, and patterns using OpenAI's
text-embedding-3-small model. - Each incoming message already gets a content embedding computed asynchronously when its transcript is saved.
- A second database trigger compares the message embedding to every rule's topic embedding using cosine similarity. If the similarity exceeds the rule's threshold, an alert is created.
- If the keyword/fuzzy tier already caught the message, the semantic tier skips it to avoid duplicate alerts.
What it catches
For a rule named "Financial Fraud Detection" with keywords like "fraud", "embezzlement", and "money laundering":
- "Someone has been manipulating the financial records" — no exact keyword, but the topic matches.
- "There are unexplained discrepancies in the accounts" — semantically related to fraud.
- "I noticed some irregular transactions in Q3" — relevant discussion without trigger words.
Semantic Sensitivity
The threshold ranges from 0.15 (very permissive) to 0.65 (very strict). The default is 0.35.
- Lower (0.15–0.25) — Casts a wide net. Useful for broad compliance monitoring but expect more false positives.
- Default (0.35) — Catches topically related discussions with reasonable precision.
- Higher (0.45–0.65) — Only matches very closely related content. Reduces noise at the cost of missed detections.
Semantic matching is opt-in and requires an OPENAI_API_KEY environment variable to be set. It is not available for regex rules.
Creating a Rule
To create a new watchdog rule:
- Navigate to Keyword Watchdog in the admin sidebar.
- On the Rules tab, click New Rule.
- Fill in the rule name, description (optional), match type, and at least one pattern.
- Optionally select which channels to monitor and set the severity.
- In the Matching Strategy section, configure fuzzy and semantic matching. Fuzzy matching is on by default. Toggle semantic matching on if you want topic-level detection.
- Click Create Rule. The rule is immediately active and will start scanning new transcripts.
Rules apply to new transcripts only — they do not retroactively scan historical messages. If you need to search past transcripts, use the Search feature in the admin dashboard.
Managing Rules
Existing rules are listed on the Rules tab. For each rule you can:
- Enable / Disable — Toggle the rule's active state. Disabled rules are retained but stop triggering alerts.
- Edit — Update the rule's name, patterns, severity, channel scope, or match type.
- Delete — Permanently remove the rule and its associated configuration. Existing alerts generated by the rule are retained.
Viewing & Managing Alerts
Switch to the Alerts tab to see every alert generated by your rules. Each alert card shows:
- Rule name — Which rule triggered this alert.
- Matched text — The snippet from the transcript that matched the pattern.
- Match method — How the match was detected: Exact, Deobfuscated, Fuzzy, or Semantic. Displayed as a colour-coded badge when the method is not "exact".
- Channel & Speaker — Where and who said it.
- Timestamp — When the transcript was created.
- Severity badge — Colour-coded severity (Low, Medium, High, Critical).
Filtering Alerts
Use the filter controls above the alerts list to narrow results:
- Status filter — Show all alerts, or only New, Acknowledged, or Dismissed alerts.
- Severity filter — Show all severities, or just Low, Medium, High, or Critical.
Alert Workflow
Each alert moves through a simple lifecycle:
- New — The alert was just triggered and hasn't been reviewed yet.
- Acknowledged — A team member has seen the alert and is investigating.
- Dismissed — The alert has been reviewed and requires no further action (e.g. false positive or resolved issue).
How the Watchdog Runs
The Watchdog operates via two PostgreSQL database triggers:
- Keyword trigger — Fires on every
INSERT or UPDATE OF transcript on the messages table. It runs the exact → normalised → fuzzy cascade synchronously inside the database. This generates alerts within milliseconds. - Semantic trigger — Fires when a message's
content_embedding column is updated (asynchronously, after the transcript is sent to the embedding Edge Function). It compares the embedding against all semantic-enabled rules and creates an alert if the similarity exceeds the threshold — but only if the keyword trigger hasn't already matched.
This architecture means the Watchdog is:
- Real-time — Keyword, normalisation, and fuzzy alerts are generated within milliseconds of the transcript being saved. Semantic alerts follow shortly after (typically under 2 seconds).
- Low cost — Exact, normalised, and fuzzy matching run as native SQL with no external API calls. Only semantic matching incurs an embedding computation (shared with the existing search infrastructure).
- Always on — As long as the rule is enabled, every transcript is checked automatically.
- No duplicates — Each rule generates at most one alert per message, regardless of how many tiers would match.
Permissions
Managing watchdog rules and viewing alerts requires the compliance.watchdog permission. By default this is granted to any role that has compliance.export access. Tenant administrators can configure visibility via the Roles & Permissions page.
Common Use Cases
- Safety Monitoring — Alert on words like "injury", "hazard", "evacuation" to catch safety incidents in real time.
- Compliance — Monitor for language that could indicate policy violations, insider information sharing, or inappropriate conduct.
- Customer Interaction Quality — Track keywords like "complaint", "refund", "escalate" across client-facing channels.
- Operational Alerts — Watch for "outage", "downtime", "critical error" in engineering or operations channels.
- Competitive Intelligence — Monitor for competitor names or product references mentioned during team conversations.
Tips & Best Practices
- Use severity levels meaningfully — Reserve Critical for urgent safety or compliance issues. Use Low for informational monitoring.
- Prefer phrase or regex over keyword for common words — "fire" as a keyword will generate many false positives. "fire alarm" or "fire in" as phrases are more targeted.
- Scope rules to channels — If you only care about safety keywords in field operations channels, set the channel filter to avoid noise from other channels.
- Tune fuzzy threshold for short keywords — Short words (3–4 chars) can fuzzy-match many unrelated words. Raise the threshold to 0.5–0.6 or disable fuzzy for rules with very short patterns.
- Add a description when using semantic — The topic embedding is generated from the rule name, description, and patterns. A detailed description gives the AI more context and improves semantic accuracy.
- Start with defaults, then tune — The default fuzzy threshold (0.40) and semantic threshold (0.35) work well for most use cases. Adjust only after reviewing actual alert quality.
- Review and tune regularly — Check the alerts tab periodically. If a rule is generating too many false positives, adjust the patterns, raise thresholds, or switch to phrase/regex matching.
- Check the match method badge — The badge on each alert tells you how it was detected. Many "Fuzzy" false positives? Raise the threshold. Too few "Semantic" hits? Lower the threshold.