Apple is adopting a novel strategy for training its AI models, avoiding the collection or replication of user content from iPhones or Macs.
A recent blog post reveals that the company intends to persist in its use of synthetic data—data created to replicate user behavior—and differential privacy to enhance features such as email summaries while ensuring that personal emails or messages remain inaccessible.
Users participating in Apple’s Device Analytics program will have their AI models analyze synthetic email-like messages by comparing them to a limited selection of actual content stored locally on their devices. The device subsequently determines which synthetic messages align most closely with its user sample, relaying information about the chosen match back to Apple. Apple has stated that no individual user data is transmitted from the device, emphasizing that the company only receives aggregated information.
Apple is set to enhance its models for generating longer-form text through a new technique that eliminates the need for gathering actual user content. The company is expanding its commitment to differential privacy, incorporating randomized data into larger datasets to safeguard individual identities. Since 2016, Apple has employed this method to analyze usage patterns, aligning with the company’s commitment to preserving policies.
Enhancing Genmoji alongside various Apple Intelligence functionalities
The company has implemented differential privacy to enhance features such as Genmoji, allowing it to gather insights on popular prompts while ensuring no specific user or device is identified with any particular prompt. Apple is set to implement comparable techniques in its forthcoming releases for various Apple Intelligence features, such as Image Playground, Image Wand, Memories Creation, and Writing Tools.
Genmoji conducts anonymous polls across participating devices to assess the visibility of specific prompt fragments. Every device emits a distinct signal, with some responses indicating genuine usage and others appearing random. According to the company, the method guarantees that only commonly utilized terms are accessible to Apple while ensuring that no single response can be linked to a specific user or device.
Developing synthetic data to enhance email summaries
Although the method above has proven effective for short prompts, Apple recognized the necessity for a different strategy to tackle more intricate tasks such as email summarization. Apple produces thousands of sample messages, transforming these synthetic communications into numerical representations, known as ’embeddings,’ which reflect language, tone, and topic. User devices involved in the process compare the embeddings against samples stored locally. Once more, it is important to note that only the chosen match is disclosed rather than the actual content.
Apple gathers the most commonly chosen synthetic embeddings from devices that opt in, utilizing this information to enhance its training data. As the process evolves, the system increasingly produces synthetic emails that are both relevant and realistic. This advancement aids Apple in enhancing its AI capabilities for summarisation and text generation while maintaining a strong commitment to user privacy.
The feature is currently accessible in its beta version.
Apple has begun rolling out the system in beta versions of iOS 18.5, iPadOS 18.5, and macOS 15.5. Apple is reportedly taking steps to tackle issues surrounding its AI development, as highlighted by Bloomberg’s Mark Gurman. These challenges have encompassed delays in feature rollouts and the repercussions stemming from shifts in leadership within the Siri team.
Although the effectiveness of this approach in generating more practical AI outputs has yet to be determined, it clearly indicates a public initiative aimed at reconciling user privacy with model performance.