Skip to content
    Back to writing
    September 1, 2024 · updated May 8, 2026 · 2 min read

    The health data broker layer is now regulated. That changes the pitch.

    The health data broker layer is now regulated. That changes the pitch — by Thomas Jankowski, aided by AI
    The broker layer entered the lens— TJ x AI

    The de-identified-health-data broker layer is, in operational terms, the most consequential information layer in U.S. healthcare that almost no patient knows exists. Datavant, Komodo Health, IQVIA, Health Verity, plus a long tail of smaller specialty brokers, run a category that touches more patient records than any single payer or provider. The brokers buy data from EHR vendors, claims warehouses, pharmacy benefit managers, and lab networks; they de-identify and tokenize; they sell the resulting datasets to pharma, to payers, to research institutions, and increasingly to AI-model trainers. The category was, until recently, lightly regulated.

    In Q1-Q2 2024 the FTC began to change the operating environment.

    The agency's enforcement actions on geolocation-data brokers in 2023 had set the precedent: data brokers selling sensitive-category data are subject to consumer-protection scrutiny even when the data is technically de-identified. Through 2024 the FTC extended that reasoning to health data, with new rulemaking on health-related data sales, with explicit mention of the de-identification-can-be-reversed reality, and with a regulatory frame that treats the broker layer as a category requiring active disclosure-and-consent rather than the implicit-consent default the brokers had been operating under.

    That shift changes the pitch every health-data broker is making.

    The 2023 pitch was: we provide de-identified data that helps research and improves patient outcomes; the de-identification is industry-standard; the data is, for regulatory purposes, anonymous. The 2024-2025 pitch has to be: the data is de-identified to the standard the FTC's new framework requires, the consent chain is documented and defensible, the secondary-use categories are explicitly disclosed, and the broker can produce an audit trail on demand. Each of those is a meaningfully more expensive pitch to make. The brokers who were running on the lighter frame are operating on a 18-24 month repaper-everything window.

    Three structural questions the operator class has to engage with.

    First, the consent layer. Patients did not consent to their data being sold to a broker chain that ends with a pharma-AI-training application; they consented to "their data being used to provide healthcare" at the moment of admission, and the chain of secondary uses was, in practice, documented in EHR-vendor terms-of-service nobody reads. The FTC's framework is going to require explicit secondary-use consent. The brokers do not have it. The window to acquire it is short.

    Second, the AI-training use case. Most of the brokers' growth in 2024-2025 is from AI-model trainers buying health datasets to train clinical-AI products. That use case was not contemplated in the original consent frame. Regulators are noticing. The probability of a federal rule that explicitly addresses AI-training-as-secondary-use within 24 months is, on the available evidence, high.

    Third, the international competition layer. EU and Canadian regulators are operating on stricter consent frames already. U.S. brokers that want to sell into international research markets have to comply with the stricter frame; the operating cost of compliance is high enough that some U.S. brokers will exit the international markets and concentrate on domestic. That fragmentation is, of course, the predictable result of regulatory divergence.

    The structural read is that the data-broker layer is repricing. Operators who depend on broker-supplied datasets (pharma, payers, AI-model trainers, research institutions) have to assume that their data costs go up over the next 24 months and that the data they are buying becomes more documented but more constrained on use. The brokers themselves are repricing their service offerings to absorb the compliance cost. The patients whose data is in the chain are, by every plausible read, going to end up better protected than they were in 2023.

    The patients who should benefit most from this regulatory shift are the cohorts most exposed to data-driven discrimination: high-risk-of-illness populations, populations with sensitive medical histories, populations whose data ends up in algorithms that price insurance or determine credit or shape employment screening. The FTC's framework, if it lands as drafted, materially helps those cohorts. Whether it lands as drafted is the next eighteen months of regulatory politics.

    The trade press will write the broker-layer regulatory shift up as a series of FTC-enforcement-action stories. The part that holds is that the category is becoming a regulated financial-data-class category, and the operators who price that into their 2025 strategy are the operators who survive the repricing.

    —TJ