This is a case study submission for Blooming Health's AI Engineer role. It implements a Prompt Similarity & Deduplication Service — a tool for managing growing prompt template libraries in voice AI platforms by finding semantic overlaps, searching by meaning, and identifying candidates for consolidation.
text-embedding-3-small, stored in SQLite.{{var}}) normalized before embedding so variable names don't skew similaritySelect a dataset in the left panel and click Load & Embed to begin.
time.perf_counter() — reported in the X-Elapsed-Ms response header (or elapsed_ms body field for embedding generation).
Semantic search latency includes the OpenAI API round-trip to embed the query on-the-fly, so it's much slower than similarity lookup which operates entirely on local numpy arrays.
Duplicate clustering is O(n²) in the number of prompts — the benchmark shows how this scales.
Run a suite of benchmarks against the loaded dataset to measure query latency at scale. Tests semantic search (includes OpenAI API call), similarity lookup (local numpy), and duplicate clustering (all-pairs comparison). Each test runs multiple iterations for stable averages.
text-embedding-3-small and compared against all stored prompt embeddings using cosine similarity (which reduces to a dot product since OpenAI embeddings are L2-normalized).
Template variables like {{question_text}} are normalized to [variable] before embedding so variable names don't skew results.
Results are ranked by similarity score with no minimum threshold — the full ranked list is returned up to the limit.
Search prompts by meaning, not keywords. Enter a natural language description of what you're looking for and the service will find the most semantically relevant prompt templates. Try queries like "how to handle user interruptions" or "verify someone's identity" to see how the system matches intent rather than exact wording.
numpy matrix operation.
The threshold parameter filters out weak matches — only prompts scoring above the threshold are returned. The queried prompt is always excluded from its own results.
Content previews are truncated to 150 characters for display.
Select any prompt from the library and find others with similar intent or purpose. Use the threshold slider to control how strict the matching is — higher values return only very close matches, lower values surface broader relationships. This is useful for identifying prompts that overlap in purpose and could potentially be consolidated.
O(n²) all-pairs comparison is fine for hundreds of prompts; for thousands+ you'd use approximate nearest neighbors (FAISS, HNSW).
Automatically detect groups of prompts that are near-duplicates and could be merged into a single template. Adjust the threshold to control sensitivity — at 0.90+ only near-identical prompts cluster together, while lower values reveal broader families of related prompts. Each cluster shows the average similarity between its members.
float32 BLOBs (6KB per 1536-dim vector).
Before embedding, template variables like {{question_text}} are normalized to [variable] via regex — this is stored as normalized_content alongside the original.
The layer field (engine, os) maps to the voice AI platform's prompt hierarchy: org → os → team → engine → directive.
Browse all prompt templates loaded into the system. Each prompt shows its ID, category, layer in the platform hierarchy, and full content text. Template variables (shown as {{variable_name}}) are placeholders filled at runtime by the voice AI platform.