Lost in transcription: How accurate are veterinary AI scribes?

By Rohan Relan

A side-by-side comparison of two transcripts. — Interpreting the same input, two AI scribe tools produce clinically distinct outputs, underscoring the importance of accuracy in veterinary documentation. Photo courtesy ScribbleVet

Veterinary practices are increasingly turning to AI scribe technology to ease SOAP documentation. AI scribes are a practical tool in veterinary clinics, helping to reduce the time spent on note-taking and freeing up teams for more patient-facing work and the highly coveted work-life balance.

However, this raises a critical question: How accurately do these tools handle specialized veterinary terminology? Errors in transcription can create confusion, slow workflow, and potentially risk patient care. How do we define transcription accuracy in veterinary medicine?

When accuracy undermines efficiency

Transcription quality affects more than note-taking. AI scribes are adopted to reduce documentation time, improve records, and streamline daily workflows. However, when accuracy is poor, those intended benefits quickly disappear.

Many veterinary AI scribes are trained in general medical or conversational audio. That training does not reflect the vocabulary used in veterinary clinics. Medications, such as Cerenia, Librela, and Carprofen, are all common in veterinary documentation but are frequently misinterpreted by models not trained in veterinary language. They are often transcribed incorrectly, with medications like Carprofen being misidentified as ibuprofen, for example. These errors risk introducing clinical confusion and compromising record accuracy.

When a scribe tool fails to accurately capture veterinary terminology, the foundation becomes unstable. It creates a trickle-down effect that compromises everything built from it: SOAP notes, client emails, referral summaries. These errors do not just waste time—they can delay treatment, confuse referral partners, or create gaps in the record that carry legal and clinical consequences.

As a result, teams end up spending more time reviewing and editing the outputs. Communication falters. Records lose reliability. A tool designed to save time ends up adding to the workload, discouraging veterinary teams from using it altogether.

Accurate transcription delivers on the reasons these tools are brought into practice. It helps teams document efficiently, communicate clearly, and maintain records they can trust.

Lacking an industry standard

Right now, there is no agreed-upon standard for what good veterinary transcription looks like in the context of AI scribe tools. No shared benchmark. No published term set. No accuracy threshold tailored to veterinary practice.

That absence creates ambiguity. It also makes it harder for clinics to evaluate tools fairly or push vendors for meaningful transparency.

Measuring accuracy: What is word error rate (WER)?

Long before AI scribes existed, transcription accuracy was measured using a method called word error rate (WER). This metric calculates the number of errors made by a tool or model when converting speech into text. It compares a generated transcript to a human-reviewed, accurate version. Specifically, WER looks at three types of errors:

Substitutions: incorrect words used in place of the correct ones
Deletions: missing words from the original spoken content
Insertions: additional words not present in the original speech

However, general WER does not fully capture accuracy challenges unique to veterinary medicine, where highly specialized language is the norm. In a clinical context, WER can be misleading.

Why WER does not reflect clinical risk

In veterinary medicine, word error rate (WER) falls short as a measure of accuracy. It treats all errors the same, regardless of clinical importance. A missed "and" and a miswritten "metronidazole" are both counted as one error. On paper, they are equal. In a medical record, they are not. One is a minor grammar issue, while the other could result in the wrong drug being documented or dispensed.

WER also does not reflect how well a tool handles veterinary-specific language. A system can have a "good" WER on paper and still fail consistently on the words that matter most. That is why accuracy needs to be evaluated through a veterinary lens.

AI scribe tool ScribbleVet measures WER-VET, an internal benchmark we developed on a curated list of veterinary terms. This helps the tool understand and improve its performance on key veterinary terms.

The transcript is just the first step in the process. Downstream tasks also introduce the possibility of errors and further blur the definition of accuracy. A multi-page, highly detailed SOAP note capturing every interaction in the room may technically be "accurate," but it also may overwhelm future readers with extraneous information. A concise note is equally accurate while aiding rather than hindering patient care.

A clearer picture of veterinary transcription accuracy

Even without a universal standard, there are still meaningful ways to assess an AI scribe tool's performance. One approach is term-level accuracy: How well does the system capture drug names, diagnoses, dosages, and procedural language? Another is domain-specific testing. Was the AI trained or validated using actual veterinary audio, including real client conversations and practice environments?

Precision is equally important. It is not enough to be able to accurately transcribe "Simparica Trio" correctly in a quiet, closed environment. When AI scribes are used in practice, a condition or medication can be mentioned several times during a single appointment, often in a loud, even chaotic setting. AI scribe tools should be tested for their ability to handle overlapping speech, speaker identification, and environmental noise, all of which are common in active exam rooms.

Perhaps most importantly, AI scribes need to be able to adapt to the user's requirements while keeping them in the loop. A behavior specialist's definition of accuracy may include details that an oncologist may find unnecessary. Customization and ease-of-verification are critical.

What comes next

The lack of veterinary-specific transcription standards for AI scribes is an industry-level blind spot. Practitioners, developers, and educators all have a stake in shaping how veterinary AI scribe tools are built and evaluated, and clear expectations help the entire field move forward.

As AI scribes become more common, the need for shared benchmarks will only grow. That process will take time. In the meantime, transparency, veterinary-specific validation, and real-world evaluation need to take priority.

Accuracy belongs at the center of evaluation, not the margins. A veterinary AI scribe has to reflect the language of practice, support clinical documentation, and reduce time spent on corrections. If it can do all of that, it is a true benefit for the veterinary workflow.

Rohan Relan is founder and CEO of ScribbleVet, an AI scribe tool used by veterinarians. He is an entrepreneur with an engineering background from UC Berkeley and Stanford, driven by a passion for cutting-edge technology. Relan's previous venture, Agawi, was acquired by Google and integrated into Google Search. Over the past nine years, he has focused on advancing AI and exploring its capabilities. Five years ago, Relan adopted Potato, a spirited street dog from Mexico. Potato became the inspiration behind ScribbleVet and is now beloved as the company's mascot.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.

Cookie	Duration	Description
ADV_u_id	3 months 8 days	Unique customer identifier used to track unique ad views and interactions with some ads.
anj	3 months	AppNexus sets the anj cookie that contains data stating whether a cookie ID is synced with partners.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	never	AddThis sets this geolocation cookie to help understand the location of users who share the information.
OAGEO	session	OpenX sets this cookie to avoid the repeated display of the same ad.
OAID	1 year	Cookie set to record whether the user has opted out of the collection of information by the AdsWizz Service Cookies.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
uuid2	3 months	The uuid2 cookie is set by AppNexus and records information that helps in differentiating between devices and browsers. This information is used to pick out ads delivered by the platform and assess the ad performance and its attribute payment.
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.