The unpredictability of AI is enough to keep any product or engineering leader up at night. If you've built anything using large language models (LLMs), you’ve likely experienced it firsthand. One day, the AI delivers flawless results, but the next, it’s generating nonsense that can disrupt your entire operation.
Even at HeyMilo, a leader in AI-powered voice agents, these challenges arose and continued to grow as they scaled. Their system, built to scale interviews for hiring managers and recruiters, occasionally faced hallucinations. Imagine their AI asking the wrong questions or delivering incorrect assessments about a candidate’s interview performance. Even the most advanced systems can misfire, leading to outcomes that erode trust and disrupt decision-making.
To address this, they implemented RAG systems, integrated NVDA’s open-source guardrails, and used observability tools like Arize AI. Despite these additions, the systems only addressed part of the problem, with the larger task being to ensure that structured data from LLMs was contextually accurate and hallucination-free.
“We needed to ensure structured data outputs were not only in the right format, but were not spewing nonsense—and we had to do this at scale” HeyMilo’s CEO Sabashan shared.
Initially, the team looked into Open AI’s structured outputs, to solve this for them. While this helped ensure the outputs were in the right format, it didn’t solve for ensuring the content was accurate.
Consider the following JSON format required after reviewing a candidate’s transcript for a python role:
{
"candidate_name": "",
"age": "",
"technical_skills": ""
}
Here’s the problem:
candidate_name
must match an actual candidate who was interviewed.age
must be under 100 years old.technical_skills
should only list skills relevant to the candidate's ability to program in Python
While LLMs help with turning unstructured data into structured data in way that was previously not possible, “getting to 100% accuracy is still a challenge, which leads to massive headaches in production” said HeyMilo’s CTO Ramie.
That’s why we built LLM Outputs. As more companies, like HeyMilo, turn to LLMs to transform unstructured data into structured formats, the urgency to detect and fix hallucinations in real-time is growing. It’s not enough to ensure data is in the right format—the data needs to be highly accurate to be truly reliable.
LLM Outputs is designed specifically to detect and help resolve hallucinations in structured data outputs from LLMs. By ensuring that structured data is both syntactically correct and contextually accurate, we’re helping companies like HeyMilo prevent costly errors and improve the reliability of their AI-powered systems.
As HeyMilo’s CEO, Sabashan, puts it:
“LLMs provide great power but the right guardrails are essential to ensure LLMs are used reliably. This is all the more true when leveraging LLMs to turn unstructured data into structured data accurately and reliably!”
Looking to solve hallucinations in your LLM-generated structured data? Join our Discord and test LLM Outputs in real time with other industry leaders.