Healthcare
Data Governance for Healthcare Data Product Teams in the Age of GenAI, Analysis, and Agents
A practitioner's guide to governed objects, executable data contracts, stewardship that ships, and why catalogs won't save you in the agent era.
The question behind the question
Someone asked me last month for "a data governance framework." What they wanted was a 40-page PDF they could hand to their CEO/CFO. What they actually needed was a way to make three decisions on Monday morning without the framework getting in the way.
That gap, between what governance documents promise and what governance decisions require, is where most healthcare data product teams are stuck right now. And GenAI is making it worse, not better, because agents hallucinate fluently when the underlying data is ambiguous.
This post is my working answer. Fair warning: I'm in the middle of this problem, not above it. If you find yourself nodding at the failure modes, welcome to the club.
Why healthcare data product teams get stuck
There are three things that make governance hard for healthcare data product teams specifically (and I'm oversimplifying, as usual).
First, the dual-customer problem. You're probably building for clinicians, analysts, and payers at the same time. A clinician wants an encounter record to mean the clinical event (when the clinician saw the patient). An analyst wants the same record to mean the billing event (the thing that generated a claim). A payer wants it to mean whatever produces the metric they report. None of them are wrong. But your data product has one encounter table, and if you don't pick a definition and enforce it, every downstream consumer builds their own shadow semantics. That's not a data quality problem. That's a governance scope problem.
Second, legacy thinking treats data as inventory. The old governance playbook says "catalog everything, label owners, run a committee." That works if your product is a warehouse full of reports. It breaks the moment your product is shipping queries, dashboards, or AI agents that have to be correct on a specific schedule. Dylan Anderson's been banging this drum on his Substack for a while (Issue #15 names the root causes, and "Business-Related Process Problems" plus "Underinvestment in Data Governance" are the two I see most often in healthcare). Inventory thinking can't handle product velocity.
Third, GenAI amplifies the failure mode. Agents are fluent, which is new. When you hand an agent ambiguous data, it doesn't throw an error. It generates a confident answer, cites your table, and moves on. A junior analyst who didn't know the difference between the clinical encounter and the billing encounter would at least hesitate. An agent won't. And the business will trust the agent more than the analyst, because the agent is faster.
If that doesn't scare you a little, I don't think we're reading the same industry.
You can't govern AI without governing data
This is the part I want to say loud and clear.
Over the last year I've watched teams stand up AI governance programs (model cards, bias audits, human-in-the-loop committees, the whole package) on top of data foundations that don't even have documented encounter definitions. The AI governance layer looks beautiful in a board deck. It fails the first time a regulator asks "why did your system say X?"
Dylan Anderson calls this the bolt-on problem in Issue #54, and it's the most useful frame I've seen for where most teams are right now. He argues that business model, data, and AI should be refactored together. Data is the active bridge between the business question and the model output, not a separate compliance domain. His earlier Issue #50 makes the same point more plainly: data governance to AI governance is a progression, not a replacement. You don't get to skip the first layer because you think AI is new and different.
In healthcare, skipping the first layer isn't a KPI miss. A wrong encounter record in a clinical data product is a patient safety incident waiting for a lawyer. HIPAA cares. The FDA cares (especially if you're anywhere near SaMD territory, Software as a Medical Device). Colorado and California have AI transparency laws on the books, and Texas, New York, and half a dozen other states are drafting their own. All of them converge on the same question: can you explain, audit, and reproduce what your system did? If you can't answer yes, the AI governance frameworks don't help you. They just make your compliance binder heavier.
So: ground floor first. Then the elevator.
Four moves a data product team can make this quarter
Here's the working framework I've been using. Frame this as a product operating model, not a corporate governance program. Four moves. All four are something your team can do this quarter without hiring a Chief Data Officer.
Move 1: Define the governed object
Not "our data." Specific entities your product promises are true.
In a healthcare data product, these usually look like: Patient, Encounter, Medication Order, Lab Result, Claim, Facility, Provider. Pick the 5-10 your product ships against. Write down what each one means in one paragraph. Business definition, not SQL definition. If you have two teams that would define Encounter differently, you don't have an Encounter entity. You have a scope problem. Fix that first.
Everything else (the 400 tables in your warehouse that nobody queries) is raw. That's fine. Raw data is allowed to be messy. Governed objects aren't.
Move 2: Write executable data contracts for those objects
This is where the governance-as-PDF approach gives up and the governance-as-code approach takes over.
A data contract is a formal, version-controlled agreement between the team producing a governed object and the teams consuming it. The contract specifies schema (columns, types, nulls allowed), freshness (how stale can this get before it's wrong), completeness (which columns must be non-null), cardinality (how many rows per day is normal), and business rules (an Encounter effective date cannot precede the patient's BirthDate, which is an actual rule I've shipped, not a theoretical one).
The critical word is executable. If your contract lives in a Confluence page, it's not a contract. It's a wish. Contracts fail pipelines. They fire alerts. They block deployments. They look like dbt tests, Great Expectations assertions, or dbt contracts. Pick the tool your team already uses and commit. Dylan Anderson's Issue #49 walks through implementation principles if you want a longer version of this argument.
Move 3: Name a steward for each object with a weekly job
This is the one that kills most governance programs. Teams assign "data owners" on an org chart, the org chart gets outdated, nobody notices.
Do something different. For each governed object, name a steward (ideally the subject-matter expert who'd be called when someone asks "what does this field actually mean?") and give them a recurring 30-minute weekly job. Pull the contract breaks from the last seven days and figure out which are real. Check the quality drift metrics on the 3-4 fields that matter. If any new business logic showed up during the week, update the rules file. That's the job. Thirty minutes. Standing on the calendar.
The weekly job matters more than the title. A Data Steward with no recurring work is a ceremonial title. A Data Steward with 30 minutes every Wednesday is a governance program.
Move 4: Instrument your agents against the contracts
This is the move that doesn't show up in Dylan Anderson's playbook, because it's the newest layer. It's also where healthcare data product teams have the most room to differentiate right now.
When an AI agent pulls from a governed object, log three things: the contract version it read against, the data freshness at query time, and a reproducibility hash of the inputs used to generate the output. Store those alongside the agent's response. If a clinician (or a regulator) asks three months later, "why did the system say this," you can answer in under an hour instead of under three days. That's the healthcare governance audit trail. And it's what connects Moves 1-3 to AI governance in a way that holds up under scrutiny.
This is also where data lineage tools and MCP-based data product interfaces are starting to show up. The specific tool matters less than the commitment to instrumenting every agent invocation against a named, versioned contract. Do that, and your AI governance program stops being paperwork and starts being code.
What GenAI and agents actually change
Most of what I've described is just good data engineering dressed up in governance clothes. The part that's new is what GenAI introduces on top.
Pre-GenAI, governance was about humans trusting data. Post-GenAI, it's about agents and humans trusting data, and humans trusting agents. That third layer introduces three failure modes I hadn't seen before, and healthcare teams need to plan around all of them.
Synthesis drift. An agent combines two governed sources in a way neither contract individually covers. Example: an agent joins Patient Demographics with Lab Result to produce a cohort summary and silently averages across populations with incompatible clinical baselines (different age groups, different comorbidity profiles, different baseline risk). Each source has its own contract. The synthesis doesn't. The FAVES framework from ONC (Fair, Appropriate, Valid, Effective, Safe) is a useful lens here, because it forces you to evaluate the output not just the inputs.
Context collapse. The agent loses provenance between upstream data and downstream claim. A clinician sees a recommendation. They can't trace it back to the data contract version that fed the model. By the time they ask "why this answer," someone has retrained the model, the underlying table has refreshed twice, and the agent has forgotten its own workflow. No audit. No reproduction.
Silent retraining. A model gets fine-tuned on ungoverned data somewhere upstream (a sandbox environment, a one-off analytics request, someone's exported CSV). The fine-tuning survives into production. Future outputs are now poisoned in ways nobody logged. NIST's CSF for AI and the WHO's guidance on large multi-modal models both call this out, and integrity checksums plus signed model packages are the current state of the art. Very few healthcare orgs are doing it.
The fix for all three isn't more process. It's executable governance artifacts the agent can read. Contracts the agent checks before querying. Observability the agent emits during querying. Lineage the agent writes after querying. If your governance program produces PDFs and not logs, your agents can't use it.
The practitioner's 5-question test
Here's what I ask myself (and what you should ask your team) before shipping anything AI-adjacent in a healthcare data product:
- Can you name the 5-10 governed objects your product promises? Not 200. A product with 200 governed objects has zero governed objects.
- Do those objects have executable contracts that fail the pipeline when they break? If the answer is "we have documentation," that's a no.
- Is there a named human whose weekly job (not org chart title) is those contracts? If the answer is "our data team" without an individual name, that's a no.
- When an AI agent uses that data, can you reproduce the output on demand? Contract version, data freshness, input hash. All three. If any are missing, that's a no.
- If a regulator asks "why did your system say X," can you answer in under an hour? Not under a week. Under an hour. With evidence.
If you can't answer yes to all five, you don't have a data governance problem. You have a product shipping risk wearing a governance costume.
What I'm still figuring out
A few things I don't have clean answers to, and would love to hear from folks who do.
How do you handle contract versioning when the clinical definition of an entity changes? It will change. SNOMED updates. ICD transitions. New regulatory definitions of "encounter." I have an opinion on this but not conviction yet.
Who should own the AI governance audit trail in a product org? Data engineering, ML engineering, or product? I've seen all three work and all three fail. My current bet is product, but I don't have a clean reason why.
At what stage of a data product's lifecycle does the cost of building out stewardship pay for itself? I suspect it's earlier than most teams think, but I can't prove it yet. If you've measured this, I want to see your numbers.
If you're working on any of this, drop me a note. I'd rather be in a conversation than publishing a framework.
Written from the practitioner chair, not the consultant seat. Citations are to Dylan Anderson's Data Ecosystem Substack, the foundational thinking I'm building on. Any bad takes are mine.
What Meta's 2025 Restrictions Mean for Data and Product Leaders
Meta's new healthcare ad restrictions aren't just another privacy update - they're a fundamental shift in how we'll have to think about healthcare growth. Drawing from my years measuring health system campaigns at Revive Health, I break down what this means for data and product leaders, why CDPs mig
Risk is what you don’t see.
In 2021, when Apple dropped iOS 14.5 along with App Tracking Transparency (ATT), the digital advertising world scrambled. Folks adapted. But Meta's latest announcement about healthcare advertising restrictions feels different. More targeted - pun intended.
I spent years at Revive (FKA Revive Health) building and measuring ad campaigns for health systems. The game was always about precision - finding the right patients, measuring conversions, optimizing spend all while preventing patient health information (PHI) exposure. We obsessed over metrics like cost-per-acquisition and return on ad spend (ROAS).
But starting January 2025, that playbook is going through another big shift.
What's Actually Happening
Two weeks ago, Meta quietly dropped some shocking news on healthcare/healthtech advertisers. Through a series of targeted emails, they announced two levels of restrictions:
- Fully restricted: Healthcare provisioning properties (think patient portals, app domains)
- Partially restricted: Healthcare marketing properties (corporate sites, lead forms)
The key impact? If you're in healthcare/healthtech, you likely won’t be able optimize for conversions anymore - at least not native in Meta Business Portal. No more tracking form fills. No more measuring patient acquisition costs. No more retargeting based on specific conditions or treatments.
Note: There are a ton of unknowns for everybody and folks are trying to get straight answers so all of this may be irrelevant in a couple of weeks.
As Chris Turitzin noted in last week’s Health Tech Nerds roundtable:
”If you're not able to send low funnel events, that changes everything in the way that you run meta campaigns... trying to run non-conversion optimized meta campaigns will understand that they just don't work from a profitability stance."
Why Now?
This isn't just Meta being cautious out of the blue. As Yulie Klerman, former LiveRamp healthcare lead explained during the roundtable: "We've seen changes in the last 4 years and specifically the last on the state privacy regulation in the states. When they explicitly call out healthcare information... they're getting closer to GDPR."
The writing has been on the wall. GoodRx's FTC settlement. The HHS guidance on tracking technologies. The proliferation of state privacy laws.
Inside the War Room: Notes from Yesterday's HTN Roundtable
Sometimes the best insights come from rooms full of people trying to solve the same problem. Last Tuesday’s Health Tech Nerds roundtable felt like a war room planning session - equal parts strategy meeting and group therapy. With so much still unknown, it was a bit similar to an OpSec briefing with panelists and folks trying to get a sense of the “known knowns” and the “known unknowns”.
Four patterns emerged that tell the story:
1. The Platform Pivot
”Shift to top of funnel video," Brian advised, sharing wins from brand lift studies. "We know it works."
But it's not that simple. Moving up the funnel isn't just a tactical shift - it's reimagining what "conversion" means in a world where we can't track it.
2. The CDP Question
Brett Gailey dropped what might be the most important insight: "We're a CAPI & event obfuscation only shop. Our Meta rep communicated to us as not being directly impacted."
A glimmer of hope? Maybe. But it requires serious technical infrastructure - pay attention to the CDP players like those specific to healthcare such as Freshpaint or a newcomer - Ours Privacy as well as cross-industry players like Segment.
3. The Compliance Paradox
Yulie Klerman, who built LiveRamp's healthcare vertical, reminded us of an uncomfortable truth: Even if you find technical workarounds, you're swimming in increasingly regulated waters.
”It's not just Meta's rules," she warned. "It's state privacy regulations, HIPAA, and public perception."
4. The Size Split
Large healthcare companies will play it safe. But as Chris Turitzin noted: "Small startups... I don't think they have that same risk."
Different companies, different risk tolerances, different approaches. Startups are going to play fast and loose with these rules cause they are under a different reality than layer players. This is known risk (this has always been true) but pay attention to when these startups grow. Do they keep the same bad habits?
The Health & Wellness Gray Zone
Here's a fun riddle: When is a health company not a health company? According to Meta... it's complicated.
The definition of "health and wellness" feels like one of those Supreme Court obscenity cases - they know it when they see it. But for those of us building products and measuring campaigns, we need something more concrete.
From the roundtable discussion, here's what we know right now:
Meta defines health & wellness as properties "associated with medical conditions, specific health statuses, or provider/patient relationships." Think patient portals, wellness trackers for specific conditions, or anything tracking health outcomes.
But here's where it gets messy:
- A fitness app? Probably fine.
- A depression tracking app? Restricted.
- A vitamin company? Depends on the claims.
- A healthcare scheduling platform? Welcome to the gray zone.
As one Meta rep told a roundtable participant: "Most health supplement brands will not be affected, unless it is a prescription or for a specific disease." But another participant's supplement brand got flagged. Classic.
The secret seems to lie in condition specificity. The more condition-specific your product or marketing, the more likely you'll face restrictions. Likely more to come here but a lot of unanswered questions at the moment.
The CDP Plot Twist
Here's the fascinating thing about constraints/regulation in healthcare tech: they often create new winners.
When Apple killed mobile tracking, Mobile Measurement Partners (MMPs) became essential overnight. When GDPR hit, consent management platforms had their moment.
Now? It might be the CDP's (Customer Data Platform) time to really shine. Being a middleman and a way for advertisers to offload liability could be a goldrush for the best positioned players.
But not just any CDP. Healthcare needs something different than most other industries. As I learned at Revive tracking multi-touch attribution across health systems - you need infrastructure that understands both technical compliance and healthcare's unique dynamics.
What Makes Healthcare CDPs Different
Think about your typical CDP. It's built for e-commerce, B2B SaaS, maybe fintech. But healthcare? That's a different beast entirely:
Event Hygiene
- Regular CDP: "Track everything, figure it out later"
- Healthcare CDP: "Track precisely what matters, with clear governance"
Identity Resolution
- Regular CDP: "More data = better matching"
- Healthcare CDP: "Clean data = compliant matching"
Activation Workflows
- Regular CDP: "Push to all channels"
- Healthcare CDP: "Push with purpose and protection - likely with a confirmation step“
The New Technical Stack
Based on the roundtable discussion, here's what the winning stack might look like:
Foundation Layer
- HIPAA-compliant CDP (like Ours Privacy or Freshpaint)
- Event obfuscation engine
- URL redaction system
Processing Layer
- Custom conversion definitions
- Privacy-safe identity resolution
- Compliant activation rules
Activation Layer
- Meta CAPI integration
- Cross-channel orchestration
- Compliance monitoring
As Brett Gailey noted in the roundtable, teams using this kind of setup might be insulated from Meta's changes. But - and this is crucial - only if implemented thoughtfully. More importantly - no one really knows yet and its unclear if Meta is even sure.
The Data Product Manager's Dilemma
If you're a data product manager in healthtech, you're probably asking:
- "Do we build this in-house?"
- "Which CDP vendors truly understand healthcare?"
- "How do we maintain performance while increasing privacy?"
The answer? It depends on your scale. But here's what I learned measuring campaigns at Revive: sometimes the most elegant solution is the most boring one. It’s ok if its complex (that’s reality) but don’t settle for complicated.
Start simple:
- Map your conversion events
- Document your privacy requirements
- Build clean activation workflows
- Test and iterate with compliance in mind
The Path Forward for Data and Product Leaders
If you're leading data, analytics, or product at a healthcare company, here's your playbook:
Rethink Measurement
- Build proxy metrics that don't rely on direct conversion tracking
- Get creative with engagement signals
- Focus on top-of-funnel indicators that correlate with intent
The reality is you never truly had ROAS down - don’t kid yourself.
As John Wanamaker famously said:
Half the money I spend on advertising is wasted; the trouble is I don't know which half
Every function (even accounting + finance) deals in assumptions and abfuscations - marketing and product simply have more unknowns. Accept it and figure out how to move forward.
Strengthen First-Party Data
- Double down on owned channels
- Build better internal attribution models
- Create measurement frameworks that don't depend on platform data
Use this as an impetus to shift from the buy side over to the build side. Get a better handle on your own data and tooling while investing in owned channels. Don’t over-rotate but don’t be completely dependent on a company like Meta - they don’t care about you or your patients.
Explore Alternative Channels
- Test channels where healthcare isn't as restricted (but be careful!)
- Build cross-channel attribution models
- Focus on content engagement metrics
When I helped instrument campaigns at Revive, we discovered something counterintuitive: restrictions often revealed better channels we'd ignored. Some thoughts:
- Reddit: Shockingly good for healthcare discussions. Their ad platform is like Meta circa 2015 - less sophisticated but more permissive. Just watch the compliance as it’s easy to get in trouble here.
- Programmatic Healthcare Networks: Yes, they're expensive. Yes, they're old school. But they understand healthcare compliance better than any social platform.
- TikTok: Before you roll your eyes - their healthcare policies are still evolving. This is both an opportunity and a risk.
- Point-of-Care Networks: Remember these? They're having a renaissance moment.
- LinkedIn: Especially for B2B healthcare. They're the tortoise in this race - slow, steady, and surprisingly stable on privacy.
The secret? Build your measurement framework first, then pick your channels. Not the other way around.
The Bigger Picture
This feels like a tipping point. But maybe that's good.
Healthcare data has always lived in a world of constraints. HIPAA wasn't the end of healthcare marketing. Neither was the HITECH Act. Or state privacy laws.
Each time, we adapted. We got better and hopefully did better by our patients. We built smarter systems and maybe this pushes folks to go back to the basics - build a product or service that produces value for patients and a business that supports it sustainably.
What Happens Next
For data and product leaders, the next few months are crucial. The situation is going to change and hopefully, Meta will give folks more clarity ( to say nothing of the uncertainty on what brands
Ask yourself:
- How can we measure success without relying on platform data?
- What does "good" attribution look like in a privacy-first world?
- How do we balance growth with increasing privacy demands?
The answers might surprise you. They usually do.
Because sometimes the best innovations come from constraints.
And healthcare data products? We've been innovating around constraints since day one.