Children’s Data Profiles Can Begin Before Birth

Eamonn MaguireEye on AIWednesday, May 27, 202617 min read

Proton engineering director Eamonn Maguire argues that a child’s digital profile can begin before birth, as parents’ emails, searches and sign-ups create signals that advertising and platform systems can use to infer pregnancy, family status and future behavior. Speaking with Craig Smith, Maguire uses Proton’s Born Private initiative, which lets parents reserve an email address for a child, to make a broader case that privacy is an infrastructure decision made long before children can consent. He extends the argument to social media, AI training data and the limits of trusting platforms whose business models depend on profiling.

The profile starts before the child has an account

Eamonn Maguire frames childhood privacy as a problem that begins earlier than most parents assume. In his account, a child’s data profile can start before birth, not because the child has touched a device, but because the parents have begun leaving signals in systems built to infer, sort, and target.

The example he gives is ordinary: emails with a gynecologist, a fertility clinic, a hospital, or a pediatrician. Those communications can indicate that someone is trying to become a parent, is pregnant, or is preparing for a child. In an advertising-driven environment, Maguire says, that is enough for systems to begin treating the household differently: ads for hospitals, birth services, gynecologists, pediatricians, or related products can begin before the child exists as an independent digital user.

The profiles of your child is being created even before they're even born.

Eamonn Maguire · Source

Born Private, Proton’s initiative, is built around a deliberately simple action: parents can reserve an email address for a child. Maguire describes the mechanics as choosing the address, donating one dollar or another amount to the Proton Foundation, receiving a secure voucher link, and then deciding whether to unlock the account immediately or years later. A parent could open it for the child right away, or wait 15 years.

But Maguire is clear that the reservation feature is not the entire point. He describes Born Private as a way to force a question: where does a child’s privacy journey start, and where does the data collection journey start? Reserving an email address is a practical anchor, but the initiative is also meant to make parents think about the data trails they create on behalf of people who cannot consent.

That framing is why email matters in Proton’s model. Maguire says many people’s digital identities are anchored in their email address. It is the identifier used to receive information and to sign up for services. If that anchor sits inside a system that profiles users for advertising, it can become part of the profiling system. If it sits inside an end-to-end encrypted, non-profiling system, at least that slice of life is not being used by the email provider for behavioral advertising.

He does not present email as a complete shield. If a parent uses Proton Mail but then uses Google Chrome and Google Search for the same pregnancy-related activity, he says, much of the privacy benefit is negated. Proton can keep the inbox from becoming part of a profile, but it cannot stop other services from learning the same information through search, browser behavior, social activity, or trackers.

Born Private is therefore a starting point rather than a finished privacy architecture. It asks parents to consider the “privacy sphere” around a child and how large that sphere should be. Maguire connects that to his own choices: he says he has a daughter and there are no pictures of her online. When her school asks whether photographs may appear even internally, he says no. When family members want to put photos on Instagram, he asks them to remove her from the picture.

The decision is not only about embarrassment or unwanted exposure today. Maguire points to future misuse, including deepfakes, as a reason parents may eventually revisit the casual publication of children’s images. His argument is that a parent’s privacy decision is rarely just personal. It can define another person’s digital surface area for years.

Three data points can become a working theory of a person

Craig Smith raises a familiar suspicion: his wife thinks her iPhone listens to dinner conversations because an ad appears afterward that seems tied to what they discussed. Smith’s own explanation is less conspiratorial and more structural: people interact with digital systems constantly, companies and data brokers collect those interactions, and the resulting profile can be so detailed that it feels like surveillance by microphone.

Eamonn Maguire does not rule out all forms of listening in the broad ecosystem. He notes that Google is “a complex machine” spanning Gmail, Google Home, and other services, and that Amazon Alexa has had examples involving conversations. But his larger answer is that the opacity of profiling creates room for conspiracy theories. People do not know how the profile is assembled because the systems do not provide real transparency.

He offers a simplified email-based example to show how little direct evidence is required. A person creates an email address and signs up for Instagram. That can suggest age. They then subscribe to a political newsletter. That can suggest political leaning. They then subscribe to an AI newsletter. With three data points, a company can begin inferring much more by comparing the user with others who share similar behaviors.

email sign-ups Maguire says can begin a much larger inferred profile

The point is not that three sign-ups prove everything. It is that three sign-ups can generate hypotheses, and advertising systems can test those hypotheses. Maguire describes a platform sending an ad for something adjacent to the inferred interest. If the user clicks, that becomes a new data point. If they do not click, that is also information. The system can move on to another test.

He uses religion as an example. If a platform lacks information about a user’s religious leaning, it can show an ad connected to one denomination or group, observe no click, then show another, observe a click, and update the profile. The same logic can apply to politics, hobbies, health interests, or family formation.

The profile is not only built from what a person gives explicitly. It is built through interaction. Maguire emphasizes that platforms can interrogate their own model gaps by proposing content and watching responses. A search result can be moved higher or lower. An ad can be framed differently. A recommendation can be tested. The user’s version of the internet becomes both a product and an instrument.

That is where his concern moves beyond privacy into behavior. Tailoring the internet to a user can shape what the user sees, what they click, what they come to believe, and which rabbit holes become available. Maguire describes a scenario in which someone without a particular political inclination encounters an interesting ad or piece of content, agrees with it, and is then pushed further into an ideology. The profile is not just descriptive; it can become causal.

Smith observes that this picture resembles a cloud constantly morphing through time. Maguire’s earlier career in visualization makes that metaphor apt: he worked on analyzing how systems change, including documents, language, security signals, and financial time series. Here the object changing over time is a person’s inferred identity, maintained by systems the person cannot inspect and often cannot meaningfully contest.

The child can be alone in a room and still be acted on

Eamonn Maguire’s sharpest example of behavioral targeting concerns children and social media. He refers to “Molly versus the Machines,” a film by director Mark Silver about Molly Russell, a 14-year-old who killed herself. Maguire describes the case as one in which a child with severe depression and suicidal thoughts was shown more and more suicide-related content by an Instagram feed, and he links that pattern to the broader question of platform harm.

His point is not simply that harmful content exists online. It is that recommendation systems can repeatedly act on a vulnerable user’s state. A child may be physically safe at home, in a bedroom, away from strangers and public spaces. But if a platform is shaping the child’s feed in real time, Maguire argues, the child is still being influenced by companies optimizing for engagement.

You think your child is safe in their bedroom, but all these companies are basically changing the behavior of your child.

Eamonn Maguire · Source

This is where Maguire’s critique of profiling becomes a critique of business incentives. Social media platforms, in his account, are engineered to keep people on-platform. More engagement means more time, more advertising opportunities, more shareholder interest, and higher stock-market rewards. The system’s success metric is not aligned with the user’s health.

Craig Smith asks whether there is a positive side to profiling. Targeted advertising can surface things people are interested in and might not otherwise find. Maguire’s answer is not that relevance has no value. It is that transparency is the critical missing condition. Some platforms offer explanations such as “why did I see this ad,” but he argues the larger regulatory problem exists because social media has long been treated as a utility rather than as a set of platforms capable of abusing users.

His analogy is gambling. Gambling systems are regulated because, without rules, gambling companies could simply take people’s money. The social damage would include misery, divorce, mental health problems, and financial harm. Maguire argues social media has analogous effects: addiction, compulsive use, engineered retention, and mental health consequences.

The problem, as he sees it, is that self-regulation has failed. Social media companies have no strong incentive to reduce usage when usage is the foundation of their business. When user health and platform goals diverge, regulation is needed to force them back into alignment.

The reward system is broken and there is no real incentive for social media companies to try and limit usage because it would damage themselves.

Eamonn Maguire

Maguire does not predict that today’s dominant social platforms will remain unchanged. He suggests that social media as it exists now may not be the social media that exists in one year or five years, particularly as regulatory pressure builds around children, data handling, and platform harms. He also says people often “gloss over danger for convenience,” a pattern he treats as broader than privacy. People follow the crowd because opting out is socially costly.

That is one reason he does not claim a private ecosystem can win merely by being ethically preferable. It has to be usable. People must not feel excluded because they chose not to use a dominant tool. The lesson he draws from Proton Mail is that privacy becomes plausible when difficult security practices are made transparent and ordinary. Before Proton Mail, he says, secure email with PGP was hard for ordinary users. Proton’s contribution was to make the process easier.

AI companies want the data they have not seen

Eamonn Maguire’s privacy concerns extend from consumer advertising into AI training. He says the valuable thing for large AI providers is data the model has not already seen. Compute matters, and architecture matters, but in his formulation, data remains “the oil for your machine.”

This is why enterprise and education accounts matter. A company such as Proton may have hundreds of thousands of internal documents that have never been exposed to the public web. Universities have student work and files uploaded into AI systems. Even if a provider’s contract says enterprise data will not be used for training, Maguire says the provider still has access to the data and the customer must trust that it will not be used.

He distinguishes consumer, paid, and enterprise defaults. For many big platforms, he says, consumer users—especially free users—typically grant permission by default for their data to be used in training. Paid users may receive different terms. Enterprise users often receive contractual assurances that their data will not be used to train models. But for Maguire, the trust problem remains because the provider can see conversations and the context uploaded with them.

That concern is sharpened by his view of how major AI companies have handled copyright. Maguire says he finds it difficult to trust companies such as OpenAI or Anthropic with private data after seeing what he characterizes as disregard for copyright law. In his account, Anthropic faced a $1.5 billion lawsuit connected to books that had been bought, scanned, and discarded. He also says Meta was found to have used a large archive of pirated books to train models. Those examples are central to why contractual assurances alone do not satisfy him.

His criticism is not that all AI training is illegitimate or that all models are equally opaque. He points to the Allen Institute’s OLMo models, the Swiss AI Initiative’s Apertus, and Nvidia’s Nemotron work as examples of models built with more open data or open processes. For Maguire, “open” should mean more than releasing weights. It should mean visibility into the data, the code, and the process.

That leads to his distinction between open models and what he calls “open washing.” Meta’s Llama, Smith suggests, is an example of a model described as open. Maguire replies that it is open weights: users can access the weights, but they do not know where the data comes from. To him, that falls short of true openness.

The scale of training data makes the opacity difficult to grasp. Maguire cites a Wired article about GPT-2 that, in his telling, said the entire English-language section of Wikipedia accounted for only 0.3% of the model’s training data. The rest, he says, would have come from scraped web pages, social media profiles, and other sources before sites such as Reddit and X locked down access.

0.3%

share of GPT-2 training data Maguire says was equivalent to all English Wikipedia, citing a Wired article

Craig Smith asks whether early claims that models had read the entire English-language internet were plausible. Maguire says yes. With crawls such as Open Crawl, large crawlers, domain-language heuristics, and filtering for junk, he says collecting English-language web content was feasible for companies with sufficient investment. If data is king, he says, companies secure the oil.

He expects the frontier of available data to expand beyond what is already online. Smith asks whether paper archives around the world will eventually be digitized and made available for training. Maguire says probably yes. He mentions OpenAI’s pen concept as interesting because many people still write notes by hand, and he points to archives such as Oxford’s Bodleian Library as “a treasure trove” of knowledge that many people have not seen. He has not confirmed OpenAI is seeking access to it, but says he has “no doubt” they would be looking at such material.

The same tension appears in his view of Google Books. As a privacy-oriented Proton executive, he says he may have to say Google is “not great.” But he also calls what Google Books did incredible. It made information searchable that had previously required physically locating the right book and reading through shelves. He recalls showing Google Books to an Oxford student researching how electricity appeared in literature over time; by searching the term and seeing trends, she could do work that might otherwise have taken years.

Digitization, in Maguire’s account, can make human knowledge easier to find and use. The same process can also expand the supply of training data for systems whose provenance, consent model, and downstream uses are not always visible to the people whose work or records are included.

Proton’s AI approach depends on encryption, open models, and constrained context

Craig Smith asks whether Proton’s own AI product, Lumo, uses a model Proton has built and whether a lack of persistent memory makes it limited. Eamonn Maguire corrects the premise: Proton does not build its own models. The reason is mostly financial. Training frontier models would cost too much. Instead, Proton deploys open models and changes them as the state of the art changes.

He lists models Proton has used or evaluated, while noting that the model landscape keeps moving. OLMo 2 was used at launch, but no longer. He also mentions current or recent open-model candidates including Qwen, GPT-OSS at 120 billion parameters, Nvidia’s Nemotron, and Apertus from the Swiss AI Initiative. His decision rule is straightforward: if a model is performant enough and fits Proton’s privacy and openness goals, Proton can run it.

Apertus, in Maguire’s account, illustrates the distinction between a base model and a ChatGPT-style product. He says the first version was probably not ready for Proton’s general-purpose chat users, but also that it was not designed for that purpose. It was created as a base model for applications in science, legal contexts, finance, and other domains that could be fine-tuned. Future releases may become more suitable for production chat use.

The harder technical question is context. A model assistant is more useful if it can remember prior interactions or search through the user’s files. But persistent memory and document search become difficult in an end-to-end encrypted environment because the service provider should not be able to read the underlying data.

Maguire describes memory as something that can be implemented in different ways: a global database constantly updated, a local user-specific store, or a hybrid system. Proton’s approach, he says, is local storage that can sync periodically. When synced, it is encrypted with the user’s keys, so Proton cannot read it on its servers.

Lumo’s Projects feature extends the same principle to documents. A user can link a project to a Proton Drive folder. Lumo downloads the files, transforms them into text, and indexes them locally. It can then use basic keyword search and prompt-based search to find relevant documents. Those documents are injected into the prompt context at runtime, sent to the GPU, and used to generate a response. The result is that the user can ask questions over business documents while Proton keeps the synced material encrypted with the user’s keys.

Maguire acknowledges that doing this properly is hard. Search is especially hard because the data needs to be available on the client, and not all clients have the same capabilities. Proton limits the problem by encouraging users to link a narrower folder relevant to a project rather than syncing an entire drive, which could overload the machine.

Lumo also offers web search and financial searches through background APIs. Users can enable or disable those depending on their threat model. If they do not want any query reaching a third party, they can turn search off. If they turn it on, Maguire says they can see what was searched and what results came back.

The broader claim is that privacy does not require stripping products of function. It requires designing systems so that context can be used without turning the provider into an all-seeing data broker. Maguire’s argument is not that Proton has eliminated every tradeoff; it is that encrypted systems can still support search, AI assistance, document context, and synchronization if the architecture is designed around that constraint.

The business model has to avoid the advertising bargain

Eamonn Maguire traces Proton’s origin to CERN during the Snowden revelations. He says Proton began in the CERN cafeteria, with Andy Yen and other founders connected to CERN and particle physics. The Snowden disclosures, combined with concerns about surveillance in democratic societies and geopolitical threats such as China’s pressure on Taiwan, pushed the founders toward privacy-preserving email.

Email was the logical first product, Maguire says, because so much personal and professional life moves through it. In science especially, day-to-day operations often revolve around sending emails, including intellectual property and personal communications. If those emails sit with Google, Yahoo, Microsoft, or another large provider, much of that personal and professional life becomes available to the platform.

Proton started in 2014 and was crowdfunded through a Kickstarter campaign that Maguire says raised around half a million dollars. Since then, he says, the company has been funded by subscribers rather than venture capital. Paying users support the infrastructure, employees, and also the free tier. That funding structure is central to Proton’s privacy story: if the product is free and not subscriber-funded, Maguire argues, the provider has to make money some other way, often through advertising.

Proton’s freemium model gives free users access to products with limits and paying users broader capabilities. For VPN, free users get access to certain countries while paying users get more choices across many gateways. For mail, paid users receive more storage and features such as custom addresses or domains. For Lumo, paid users get increased limits and better models.

The product suite is now broader than email. Maguire lists Proton Mail, VPN, Drive for documents and photos, Proton Pass for password management, Docs and Sheets, Lumo as the AI assistant, and Meet as a video-conferencing product launched a few weeks before the conversation. Proton Workspace, aimed more at businesses, is meant to package the ecosystem so companies can operate with Proton tools while keeping intellectual property and confidential data within a private environment.

Craig Smith characterizes Proton as a possible kernel for a privacy ecosystem that could counter Google and Meta. Maguire is cautious about the scope. He does not see Proton creating a social media platform. He jokes that email is “kind of like social media” in that it involves messages, connections, and groups, but he does not present Proton as a full replacement for every dominant consumer platform.

His ambition is narrower and more practical: make privacy the default and make it easy enough that people do not feel punished for choosing it. That means building functional alternatives where Proton can credibly do so. In his formulation, the goal is to reach a point where users no longer have to compromise on functionality in order to use something private.

This also connects back to Maguire’s claim that privacy choices are often made on behalf of other people. A person can give away information about others without meaning to. Parents can publish a child’s face. A family member can upload a group photo. A person can reveal pregnancy or fertility information through emails and searches.

Maguire uses 23andMe as the clearest example. If someone took a genetic test and gave away their genetic data, he says, they made a privacy decision not only for themselves but also for their children, siblings, aunts, uncles, and wider family. Genetic information is shared information. The individual consent model does not capture the full blast radius.

Born Private reserves an email address, but the campaign asks parents to notice how many choices they make for someone else before that person can decide. The child’s first digital identity, first exposed photo, first school image, first medical email trail, and first platform accounts may all be created by adults.

For ordinary users, Maguire says the starting point is simply creating a Proton Mail account at protonmail.com. For Born Private specifically, he gives the address proton.me/mail/bornprivate and says parents can also find it by searching for Born Private in a privacy-friendly browser.

AI Application Architecture Data and Training RAG and Knowledge Systems AI Governance and Regulation Open Models