During the WWDC26 keynote, Apple announced its third generation of Apple Foundation Models (AFM), comprising five models, some of which are local, some of which are cloud-based, and one of which lives in Google’s servers running on Nvidia chips. Here’s a breakdown of how that will work.
A bit of background
When Apple first announced its foundation models in 2024, the lineup included an on-device language model with roughly 3 billion parameters, and “a larger server-based language model available with Private Cloud Compute and running on Apple silicon servers,” as the company put it at the time.
Private Cloud Compute was an ambitious undertaking, as it aimed to deliver cloud-based AI capabilities while preserving the same privacy guarantees users expect from on-device processing.
For this reason, keeping everything in-house was essential. Private Cloud Compute ran in Apple data centers, on servers powered by Apple silicon. Even so, its privacy guarantees could be independently verified by third-party security researchers.
However, as Apple struggled to get its AI aspirations off the ground, the company partnered with Google to use Gemini as the backbone of its new AI efforts, the results of which it announced earlier this week during the WWDC26 keynote.
Apple’s new foundation models
The third generation of AFMs includes five models: AFM 3 Core and AFM 3 Code Advanced, which are on-device models, and AFM Cloud, ADM 3 Cloud (Image), and AFM 3 Cloud Pro, which are server-based. The D in ADM 3 Cloud (Image) stands for diffusion, a technology we’ve covered in the past here.
Except for AFM 3 Cloud Pro, all other models were built to run on Apple silicon devices. AFM 3 Cloud Pro, meanwhile, runs on NVIDIA GPUs hosted in Google Cloud.
This was made possible afer Apple extended its Private Cloud Compute architecture to third-party infrastructure for the first time, “while maintaining Apple’s powerful security and privacy protections,” according to the company.
As for the models themselves, here’s a breakdown of each one, as explained by Apple:
- AFM 3 Core, the next generation of our 3-billion-parameter dense model that delivers a step up in quality.
- AFM 3 Core Advanced, our most powerful on-device model. It’s natively multimodal, enabling helpful features like expressive voices and higher-accuracy dictation. Built on cutting-edge Apple research, this 20-billion-parameter model uses a sparse architecture, activating just 1 to 4 billion parameters at a time depending on the request. AFM 3 Core Advanced is unlocked by and optimized for our most capable Apple silicon systems.
- AFM 3 Cloud, our server-side workhorse, optimized for speed, efficiency, and performance.
- ADM 3 Cloud (Image), for image generation and editing, which unlocks advanced photo-editing tools, the all-new Image Playground, and more.
- AFM 3 Cloud Pro, our most capable server-based model, which powers our most demanding use cases, like agentic tool use and complex reasoning.
The highlights here are AFM 3 Core Advanced and AFM 3 Cloud Pro.
Beginning with AFM 3 Core Advanced, it packs 20 billion parameters into an on-device model, which is no small feat. Most on-device models aimed at the general public tend to stay in the low-single-digit billions of parameters.
To make AFM 3 Core Advanced run well, Apple used a sparse architecture that activates up to 4 billion parameters at a time, depending on the prompt, rather than a dense architecture that would need to keep all 20 billion parameters active for every request.
Although conceptually similar to the Mixture of Experts approach, this selective activation relies on a technique Apple invented and detailed in the interesting study Instruction-Following Pruning for Large Language Models released a year ago.

As for AFM 3 Cloud Pro, this is the one that runs on an external infrastructure. You can read some of the technical details of this expansion in this article published on Apple’s Security blog earlier this week, but here’s the most important part:
On this foundation, Apple and Google collaborated to build capabilities that go far beyond a traditional confidential computing deployment:
- We do not rely solely on confidential computing technologies to mitigate attacks that leverage privileged access outside of a confidential VM, including side-channel attacks. We consider every component — from firmware through the host and guest OS stacks to application code — to be part of our trusted computing base, subject to our verifiable transparency and no-privileged-access guarantees.
- To mitigate the risk of supply chain attacks, we maintain a cryptographically verifiable, append-only ledger of all Google Cloud hardware that is part of the PCC fleet. For components that could be abused to exfiltrate user data if compromised, our software attestation is rooted in at least two separate roots of trust from independent vendors.
- Even when deployed with confidential computing, we believe the inference stack must be designed with privacy and security from the start. PCC on Google Cloud leverages many of the same architectural security patterns as PCC on Apple silicon to implement these layered protections: initial network data parsing for each request happens in a dedicated process within its own namespace, shared inference software is recycled with a short time-to-live duration, and attested keys are held in a separate, dedicated confidential VM isolated from external inputs.
In its Machine Learning Research blog, Apple says that all five models “shared a common initial foundation before specializing for their respective architectures and use cases, adding multimodal capabilities like audio, image understanding, long-context reasoning, and high-quality visual generation.”
The company adds that, to train these models, it used “a mixture of data that includes publicly available information, data licensed or purchased from third parties, open-sourced data, data obtained through dedicated studies, and synthetic data.” Apple also stresses that the training process did not include user data or interactions and that web publishers can opt out of foundation model training.
The results
Apple says it conducted extensive human evaluations of its third-generation foundation models, with in-house reviewers grading responses across categories such as instruction following, truthfulness, presentation, and image understanding.
Models were evaluated against their predecessors (when applicable), and you can see some of the results below:

Fraction of preferred responses in side-by-side human evaluations of general text capabilities, comparing AFM 3 Core and AFM 3 Cloud against our previous generation of models. Results are presented across four distinct locale groups to demonstrate consistent performance across international variants. “English” represents our global English evaluation set, while “PFIGSCJK”, “DNNSTV” and “AFIHHMPRTU” represent our remaining supported global locales.

Fraction of preferred responses in side-by-side human evaluations of image understanding capabilities in English. The results compare AFM 3 Core and AFM 3 Cloud against their 2025 predecessors.

Fraction of preferred responses in side-by-side human evaluations for dictation tasks. The results compare AFM 3 Core Advanced against Apple’s existing production dictation system across seven quality dimensions. AFM 3 Core Advanced demonstrates a positive win rate in overall quality, with preference extending consistently across all individual formatting and comprehension dimensions.
For an even deeper dive into the third-gen Apple Foundation Models, follow this link.
Worth checking out on Amazon
- Geoffrey Cain – ‘Steve Jobs in Exile’
- David Pogue – ’Apple: The First 50 Years’
- MacBook Neo
- Logitech MX Master 4
- AirPods Pro 3
- AirTag (2nd Generation) – 4 Pack
- Apple Watch Series 11
- Wireless CarPlay adapter
FTC: We use income earning auto affiliate links. More.
