📅 April 28, 2026 ✍️ By Ezra Cohen ⏱️ 10 min read

The AI-EHR scorecard: rating Jane, SimplePractice, Carepatron and Oli on 12 criteria

A published rubric, four vendor columns, totals scored against current public materials — and the gaps Oli is willing to admit.

Names and identifying details in this article have been changed to protect privacy.

The phrase "best AI EHR 2026" returns page after page of ranked lists. The rubric behind those rankings is published in approximately none of them. Most of what gets crowned "best" is a list of ten products, no criteria, no scoring, and a footer asking the reader to subscribe.

This post fixes one half of that problem. It publishes the criteria, scores four vendors, and shows the working. Jane App, SimplePractice, Carepatron, and Oli Health are evaluated on twelve criteria pulled from each vendor's public documentation as of April 2026. The CSV is at the bottom. Remix it. Tell me where I scored wrong.

A note on bias before the table. I work on Oli. I have a vested interest in Oli looking good on this scorecard. I also have a longer-term interest in the rubric being defensible, because a rigged scorecard is a one-quarter trick — and this one is meant to be republished every 90 days.

The 12 criteria, grouped

The criteria fall into four buckets. Each scores 0 to 3.

Intake & pre-visit (3 criteria)

Conversational AI intake — Adaptive questioning where each answer changes the next question. Structured output to the chart, not a free-text dump.
Patient overview / pre-visit synthesis — An AI-generated brief the practitioner reads before walking in. Pulls from prior chart, intake, lab uploads.
Document OCR — Automatic extraction from uploaded PDFs, lab reports, or photos of paper notes. The data lands in the right chart fields, not in an attachment graveyard.

Documentation (3 criteria)

AI scribe — Audio capture during a visit, structured note out. The category most vendors compete on. Accuracy and speed are weighted equally.
Template filling — AI populates the template the clinic already uses (SOAP, DAP, BIRP, custom). Not a generic narrative — the actual fields.
Chart search / retrieval — Natural-language query across the patient's chart history. "When did we last titrate her dose?" — answered without scrolling.

Workflow (3 criteria)

Referral drafting — AI generates referral letters from the chart, with the right tone and clinical context.
Coding suggestions — ICD-10 / CPT code suggestions from the note. The practitioner accepts, edits, or rejects.
Follow-up / re-engagement agents — Automated patient nudges that go beyond appointment reminders. No-show recovery, lapsed-patient outreach, dose-titration check-ins.

Safety & transparency (3 criteria)

Hallucination guardrails — Phrase-loop detection, medication cross-check, draft-status enforcement on every AI-generated artifact. Documented, not assumed.
On-device vs cloud / data residency — Regional deployment, BAA chain documented, PIPEDA awareness if Canadian patients are involved.
Model transparency — The vendor publishes which model, which provider, which region. If buyers cannot answer "what generated this note," the vendor has not met the bar.

The 0–3 scale: 0 = not available; 1 = partial, beta, or token-capped; 2 = available and functional; 3 = available and best-in-class.

The scorecard

Scoring is anchored to current public materials: Jane's feature index, AI Scribe page, security FAQ, and cloud security white paper; SimplePractice's feature index, Note Taker page, Note Taker FAQ, and Canada guide; Carepatron's feature index, Carepatron AI page, Ask AI help article, pricing page, and trust center; plus Oli Health's AI product index and security page. Where I had to interpret marketing language, I scored the lower of the plausible reads.

#	Criterion	Jane	SimplePractice	Carepatron	Oli
1	Conversational AI intake	0	0	1	3
2	Patient overview / pre-visit synthesis	0	2	1	3
3	Document OCR	0	0	1	2
4	AI scribe	2	2	1	3
5	Template filling	2	2	2	3
6	Chart search / retrieval	1	1	2	2
7	Referral drafting	0	0	0	2
8	Coding suggestions	0	0	0	1
9	Follow-up / re-engagement agents	0	0	1	2
10	Hallucination guardrails	1	1	0	3
11	Data residency / BAA chain	3	1	1	3
12	Model transparency	0	1	0	2
	Total / 36	9	10	10	29

A few rows worth annotating before anyone yells.

Row 4 (AI scribe). Jane's AI Scribe is live in public docs and works. SimplePractice's Note Taker works. Carepatron's scribe is real, but the free tier is capped at 1M AI tokens per month, and Carepatron does not publish a per-note conversion formula. The 1 vs 2 difference comes down to whether a solo clinic can rely on the feature for a full month of patients without usage uncertainty.

Row 6 (chart search / retrieval). Carepatron gets a 2 because Ask AI supports natural-language Q&A against the current client, latest notes, uploaded PDFs/images, and completed form fields. Jane and SimplePractice get 1s because public materials support narrower chart retrieval or summaries, not broad natural-language chart search.

Rows 6, 7, and 9 (Oli workflow agents). Oli supports natural-language chart retrieval, referral-letter drafting, and follow-up / re-engagement agents, so rows 6 and 7 move from 1 to 2 and row 9 remains a 2. I am still not scoring those rows as 3s because the 3-point bar is best-in-class: broader query coverage, deeper referral workflow controls, and more closed-loop re-engagement analytics.

Row 11 (data residency). Jane gets a 3 because account data is stored in the region selected for the account, and its AI Scribe docs disclose temporary U.S. processing for some Canadian AI Scribe workflows before outputs return to the home region. SimplePractice gets a 1 because Canadian practice owners can create accounts, but its own Canada guide says all data servers are located and backed up in the United States. Carepatron's trust center lists major compliance frameworks, but public AI-inference residency and BAA/subprocessor mapping is harder to verify. Oli gets a 3 because both PIPEDA and HIPAA pipelines are documented vendor-by-vendor.

Row 12 (model transparency). Jane and Carepatron do not publish a specific LLM, provider, and processing region in the public pages I found. SimplePractice gets a 1 because its Note Taker FAQ names Claude by Anthropic, but does not publish region routing. Oli publishes Azure OpenAI region routing in its own pricing-architecture post — call that a 2, not a 3, because there is still room for more documentation around fallback behavior.

29 vs 9

The spread between Oli and the lowest scorer (Jane) is the rubric working as intended. A scribe-only architecture caps out around the floor of where an agentic stack starts.

What is the best AI EHR in 2026?

By the 12-criteria scorecard above — applied to public vendor materials in April 2026 — Oli Health scores 29 of 36, SimplePractice 10, Carepatron 10, and Jane App 9. The best AI EHR depends on what a clinic weights, but on the AI feature surface specifically, scribe-centered platforms underperform agentic stacks by a wide margin. The rubric is published so any reader can re-weight it.

Why "AI scribe" scored high but the vendors still scored low

Three of the twelve criteria are scribe-adjacent: rows 4, 5, and 10. A vendor that ships an excellent scribe with template filling and decent guardrails can plausibly score around 8 of 9 on those three rows. That is the ceiling of "scribe-only."

Six of the remaining nine criteria sit mostly outside visit transcription: intake, pre-visit synthesis, OCR, referral drafting, coding suggestions, and re-engagement. A scribe alone does not cover those jobs; pre-appointment summaries are useful, but they are still only one slice of the agentic stack. Architecture is the ceiling.

This is the part the buyer's-guide posts tend to skip. Most of "what AI does in healthcare" today is documentation — and documentation is the easiest agent to ship because the audio comes pre-bounded by an appointment slot. The harder agents are the ones that have to work between appointments: read a referral the patient brought in, summarize a chart that is six visits deep, draft an outreach to the patient who stopped showing up. Those agents need access the scribe never asks for.

A practitioner I have talked with at a multi-disciplinary clinic in Halifax put it more bluntly. She said the AI scribe gave her back her evenings, which is real. But the rest of her week — the prior auths, the lab uploads that need to land in the right chart field, the reminder she keeps meaning to send the patient who is plateauing on semaglutide — that work is still hers. The scribe is the cheap win. The expensive wins are the ones the rubric measures past row 5.

"Scribe-only vendors max out at 9 of 9 across the three scribe-adjacent rows. The other 27 points are an architecture problem, not a feature problem."

This is also why the scorecard does not collapse into "Oli wins" by way of a single feature. Oli's 29 comes from being strong across many rows. Where the architecture is supported but still not best-in-class — chart search, referral drafting, and re-engagement — Oli scores 2, not 3, and the scorecard reflects that.

Why Oli does not score 36

The scorecard does not credit Oli for unfinished work or over-claim best-in-class status. Six rows where Oli left points on the table are worth naming explicitly.

Row 3 (document OCR): 2. Oli can extract from uploaded documents and route the output into the chart, but I am not calling it best-in-class until specialty-specific lab parsing and exception handling are stronger across more document types.

Row 6 (chart search): 2. Oli supports natural-language retrieval across the patient chart. A clinician can ask chart-level questions and pull prior context without manually scrolling. I am holding back the 3 because broader cohort-style queries and specialty benchmark coverage still need to be stronger.

Row 7 (referral drafting): 2. Oli supports referral-letter drafting from chart context. A 3 would require more mature handoff controls: attachment selection, referral-status tracking, specialty-specific letter variants, and stronger auditability around what context was used.

Row 8 (coding suggestions): 1. Oli suggests ICD-10 codes from the note for a subset of specialties (mental health, weight management). CPT is partial. The criterion calls for both, so 1 is the honest score until coverage is broader.

Row 9 (re-engagement agents): 2. Oli supports no-show recovery, dose-titration check-ins, and lapsed-patient outreach. It stays at 2 because the best-in-class version would include more closed-loop measurement: response rates, booked-appointment recovery, protocol-level outcomes, and tighter specialty-specific automation.

Row 12 (model transparency): 2. Oli publishes more than the other vendors here, including Azure OpenAI region routing, but the fallback behavior and model-selection policy still need more public documentation before I would score it 3.

The pattern across these six deductions is the same: supported is not the same thing as best-in-class. Rows move up when the public evidence supports the higher score. That is the discipline the rubric requires of every vendor on the table, including the one I work on.

Why does SimplePractice score lower than Oli on AI criteria?

SimplePractice's AI footprint is still concentrated in Note Taker. It earns credit for AI scribing, template-style notes, pre-appointment summaries, draft-review guardrails, and naming Claude by Anthropic. It still scores low on conversational intake, OCR, natural-language chart search, referral drafting, from-note coding suggestions, re-engagement agents, Canadian data residency, and public region-routing detail.

The quarterly update commitment

This scorecard is republished every 90 days. The next versions are scheduled for late July 2026, late October 2026, and late January 2027.

What changes between now and then is the data, not the rubric. If Jane ships a referral-drafting agent, its row 7 moves from 0 to 2 or 3. If SimplePractice publishes a Canadian residency option, row 11 moves from 1 to 2. If Carepatron lifts the token cap on its free tier or publishes a reliable per-note token conversion, row 4 moves from 1 to 2. If Carepatron publishes model/provider/region detail, row 12 moves. If Oli broadens chart-search coverage, referral workflow controls, or re-engagement analytics, rows 6, 7, or 9 can move from 2 to 3.

I will also score Oli by the same rubric in public, and I will lose points where the work has not landed. The credibility of the scorecard depends on it.

The CSV — twelve rows, four columns of scores, totals — is downloadable here. Anyone who wants to re-weight the rubric, swap in a vendor I missed, or build their own version is welcome to. A few clinic owners have already done it privately and emailed me their adjusted versions; the most useful change has been adding a thirteenth row for telehealth recording quality, which I am considering for the next update.

How should I compare AI EHRs?

Pick a finite set of criteria that matter for your practice — usually 8 to 15 features — and score each platform 0 to 3 against public documentation. Weight criteria by how much they affect your weekly workload. Avoid lists that score "user-friendliness" or "design" without anchoring to a measurable feature. Re-score quarterly, because vendor capabilities shift faster than blog posts get updated.

Quick reference

Is Carepatron's free tier enough for an AI-first practice?

Carepatron's free tier is enough for light AI use, but it should not be treated as unlimited AI for a full-time practice. The public pricing page lists 1M AI tokens on Free and unlimited AI tokens on Plus and Advanced, but it does not publish a reliable conversion from tokens to complete clinical notes. That is partial certainty, not AI-first certainty. The token-cap breakdown shows the cap and the assumptions behind the math.

A practitioner running a busy GLP-1 clinic in Austin tracked her own week against this rubric before switching software last fall. She had been on a generalist EHR with a bolt-on scribe. Her honest self-score against rows 1 through 9 was 7 — almost everything outside the scribe was manual. After three months on a stack with conversational intake and pre-visit synthesis, she scored herself 17 on the same rows. The scribe got better, but the work that was hers shrank more than the work that was the software's.

She also told me some advanced chart-search workflows still bother her. Support exists; best-in-class takes longer.

If a 12-criteria scorecard maps even half-way to what your clinic cares about, the way to test the rubric is to score whatever software you currently use against it. The exercise takes a long lunch. If Oli is on your shortlist after that, the $20 flat plan is what it costs to try the agentic side of the rubric for a quarter.

FAQ

When is this scorecard next updated?

The next refresh is scheduled for late July 2026, with a public diff: which rows moved, which vendor changed the underlying capability, and which scores I revised after pushback from buyers or other reviewers. The cadence is 90 days, with the same rubric and the same four vendors, plus any AI-EHR that has gained material market share in the prior quarter.

The AI-EHR scorecard: rating Jane, SimplePractice, Carepatron and Oli on 12 criteria

The 12 criteria, grouped