Apple just changed the game at WWDC 2025 with the Foundation Models framework. For the first time, you can now run the exact same AI model that powers Apple Intelligence directly inside your own iOS apps. No internet connection needed, no OpenAI API bills, just pure on-device artificial intelligence.
Because I really want to know how capable the local llm is, I build a more complex app. This isn’t just another demo project. We’re building a real-world AI chatbot that can handle complex conversations, remember context, and even enhance its knowledge with custom data when needed.
Building a Real-World AI Chat App
Our demo app is a “Dog Helper” that showcases practical AI implementation. Users can:
- Ask preset questions like “Tell me about Border Collies”
- Have natural conversations – ask follow-ups like “Are they good with kids?”
- Get enhanced answers – when the local model lacks knowledge, we use tool calling to fetch additional data
- Handle unknown breeds – for rare breeds like “Caucasian Shepherd,” the AI gracefully uses our custom data
The magic happens when you ask about allergy-friendly dogs. Instead of generic answers, our tool calling system searches through curated breed data and returns specific recommendations.

What is Apple’s Foundation Models Framework?
The Foundation Models framework gives you direct access to Apple’s local large language model (LLM) – the same AI brain behind Siri’s new capabilities and Apple Intelligence features. Think of it as having ChatGPT built right into your iPhone, but completely private and offline.
This is huge because:
- Zero API costs – no more paying per request to OpenAI or Claude
- Complete privacy – all AI processing happens on-device
- Lightning fast – no network latency, instant responses
- Always available – works in airplane mode or poor connectivity
Understanding Foundation Models vs Apple’s Foundation Framework – Common Naming Confusion
Don’t get confused by the name! Apple’s Foundation Models framework is completely different from their older Foundation framework.
In AI terminology, a “foundation model” means a general-purpose base model that you can customize for specific tasks. It’s called “foundation” because it’s the foundation you build specialized AI features on top of.
Foundation Models Performance and Limitations
Technical Specifications That Matter:
- 3 billion parameters (vs ChatGPT’s 100+ billion)
- 3GB RAM usage (why newer devices are required)
- 4,096 token context window (conversations have memory limits)
- Text-only input/output (no image processing)
- Knowledge cutoff: End of 2023
- 16 language support (check
model.supportedLanguages
) - Adapter compatibility for fine-tuning specific use cases
What This Model Excels At:
- Text summarization – Great for condensing content
- Information extraction – Pull structured data from text
- Content classification – Categorize text by type/topic
- Simple content generation – Basic writing tasks
What To Avoid Using It For:
- Philosophy or complex reasoning – Too small for deep thinking
- Current events – Knowledge cutoff limitations
- Advanced math – Prone to calculation errors
- Creative writing – Limited compared to larger models
Device Compatibility Requirements – The Hard Truth About Apple Intelligence Availability
Here’s what you need before diving in – and it’s more limited than you might think:
iPhone Compatibility for Foundation Models Framework
- iPhone 15 Pro and 15 Pro Max (A17 Pro chip)
- All iPhone 16 models (A18 chip)
- Older iPhones won’t work – sorry iPhone 13/14 users
iPad Support for On-Device AI
- iPad Mini 7th gen (A17 Pro)
- iPad Air M1/M2 models
- iPad Pro M1/M2/M4 models
- Older iPads are incompatible – even recent non-Pro models
Mac Requirements for Foundation Models
- Any Mac with M1, M2, M3, or M4 chips
- Intel Macs are completely unsupported
- macOS Sequoia 15.2+ required
Plus, Apple Intelligence must be downloaded and enabled (3GB download, 30-minute setup).
Market Reality – How Many Users Can Actually Use Your AI App?
Let’s talk numbers. Based on a current device adoption roughly:
- Only 15% of iPhones can run Foundation Models apps
- 30% of iPads are compatible (thanks to M1 adoption)
- 50% of Macs support it (M-series popularity growing)
This means your app needs robust fallbacks. Don’t build AI-only features – always have non-AI alternatives ready.
How to Check Device Compatibility
Smart developers always check compatibility first. Here’s the complete implementation:
import SwiftUI
import FoundationModels
struct ContentView: View {
private var model = SystemModel.default
var body: some View {
switch model.availability {
case .available:
// Device supports AI - show full features
ChatbotView()
case .unavailable(.modelNotReady):
// Compatible device, but AI model still downloading
ModelDownloadingView()
case .unavailable(.deviceNotEligible):
// Old device - offer alternative features
NonAIFallbackView()
case .unavailable(.appleIntelligenceNotEnabled):
// User needs to enable Apple Intelligence
EnableIntelligenceView()
}
}
}
Testing Device Compatibility States in Xcode Simulator
Apple made testing easy with built-in simulator options. In your Xcode scheme settings:
- Go to Run → Arguments → Environment Variables
- Find “Simulating foundation model availability”
- Set values like:
deviceNotEligible
– Test old device flowappleIntelligenceNotEnabled
– Test setup flowmodelNotReady
– Test download state
This lets you test all compatibility scenarios without owning multiple devices

Running Foundation Models in Playgrounds
Let’s start simple with a playground to understand the basics:
import FoundationModels
import Playgrounds
#Playground {
// Create a language model session
let session = LanguageModelSession()
// Define your prompt
let prompt = "What is the meaning of life?"
// Get AI response (async operation)
do {
let response = try await session.respond(to: prompt)
print(response.content) // This is your AI-generated text
} catch {
print("AI generation failed: (error)")
}
}
Performance reality check: This took 23 seconds on an M1 Mac for a philosophical question. The model runs at your device’s speed – no cloud acceleration here.

Handling AI Generation Errors and Guardrails
The Foundation Models framework has strict safety measures:
do {
let response = try await session.response(to: prompt)
return response.content
} catch let error as FoundationModels.GenerationError {
switch error.type {
case .exceedsContextWindowSize:
// Conversation too long - start new session
return "Let's start a fresh conversation"
case .guardrailViolation:
// Content flagged as unsafe
return "I can't help with that request"
case .unsupportedLanguage:
// User used non-supported language
return "Please ask in English or another supported language"
case .rateLimited:
// App backgrounded - system prioritizing foreground apps
return "Please try again in a moment"
case .concurrentRequests:
// Multiple requests on same session
return "Please wait for current response to complete"
default:
return "Something went wrong with AI generation"
}
}
Real-World Example – Building a Dog Breed Knowledge Assistant
Let’s test the model’s knowledge boundaries with a practical example:
// Test with well-known breed
let borderColliePrompt = "Tell me about Border Collies for apartment living"
// Result: Good general knowledge, reasonable advice
// Test with rare breed
let caucasianPrompt = "Can I keep a Caucasian Shepherd in an apartment?"
// Result: Hallucination! Says they're "wonderful apartment companions"
The problem: The model confidently gives wrong advice about a 150-pound protective breed being apartment-friendly. This is why tool calling becomes essential.
Why Tool Calling Is Essential for Production AI Apps
The local model has knowledge gaps. When it doesn’t know something, it often guesses wrong instead of admitting ignorance. Tool calling lets you:
- Detect knowledge gaps – Recognize when the model lacks specific info
- Fetch accurate data – Pull from your curated database
- Enhance responses – Combine AI reasoning with factual data
- Maintain accuracy – Prevent harmful misinformation
Think of tool calling as giving your AI access to Google, but with your own trusted data sources.
Working Around Text-Only Limitations with Vision Framework Integration
The Foundation Models framework only handles text, but you can build powerful workflows:
// 1. User takes photo of a recipe
// 2. Use Vision framework to extract text from image
// 3. Pass extracted text to Foundation Models
// 4. AI formats and structures the recipe data
This pattern works great for:
- Document scanning – Extract and process text from images
- Recipe digitization – Photo to structured recipe data
- Text translation – OCR + AI translation
- Content organization – Extract and categorize information
Choosing the Right Use Cases for 3-Billion Parameter Models
The model size matters. Here’s when Foundation Models work well vs when you need alternatives:
Perfect Use Cases
- App help chatbots – Answer questions about your app’s features
- Content summarization – Condense long text into key points
- Data extraction – Pull specific info from unstructured text
- Simple classification – Categorize content by type or sentiment
Consider Cloud APIs Instead
- Creative writing – Need larger models for quality output
- Complex reasoning – Mathematical or logical problem solving
- Current information – Anything requiring up-to-date knowledge
- Multi-language support – Beyond the 12 supported languages
Configuring Your Language Model Session for Production Apps
The LanguageModelSession
has several important parameters you can customize:
let session = LanguageModelSession(
model: .default, // Can use custom adapters here
guardrails: .default, // Safety filters (strict by default)
tools: [], // We'll add tool calling in part 2
instructions: """
You are a dog specialist. Your job is to give helpful
advice to new dog owners.
"""
)
Setting Effective System Instructions
Instructions are more powerful than regular prompts. They define your AI’s personality and boundaries:
let instructions = """
You are a dog specialist. Your job is to give helpful advice to new dog owners.
"""
// Test the boundaries
// User asks: "What is 1 + 1?"
// AI responds: "I'm sorry, I cannot assist with that request."
Instructions act as strong guardrails – even when users try to misuse your app, the AI stays in character.
Controlling AI Response Quality with Generation Options
Temperature Settings for Creativity Control
let options = GenerationOptions(
sampling: .default, // Balanced creativity
temperature: 1.0, // 0 = robotic, 2 = chaotic
maxTokens: nil // Let it finish thoughts naturally
)
Avoid these common mistakes:
temperature: 0
= Always identical, robotic responsestemperature: 2
= Too random and incoherent- Setting
maxTokens
too low = Cut off mid-sentence
Better Ways to Control Response Length
Instead of limiting tokens (which cuts off responses), use natural language:
// ❌ Bad: maxTokens: 200 (cuts off mid-sentence)
// ✅ Good: Add to instructions
"Keep responses to 100-200 words"
"Answer in 2 paragraphs"
"Give brief, concise answers"
Implementing Real-Time Streaming Responses
Nobody wants to wait 23 seconds staring at a blank screen. Streaming shows text as it generates:
class ChatViewModel: ObservableObject {
@Published var partialGenerated: String.PartialGenerated?
@Published var isResponding = false
private var streamingTask: Task<Void, Never>?
func sendMessage(_ userInput: String) {
isResponding = true
streamingTask = Task {
do {
let stream = try session.streamResponse(to: userInput)
for try await partial in stream {
// Check if task was cancelled
guard !Task.isCancelled else { break }
// Update UI with each new token
await MainActor.run {
self.partialGenerated = partial
}
}
// Streaming complete
await MainActor.run {
self.isResponding = false
self.saveResponse()
}
} catch {
await MainActor.run {
self.handleError(error)
}
}
}
}
}

Creating Smooth Streaming Animations in SwiftUI
Make your streaming responses feel polished with proper animations:
struct StreamingResponseView: View {
let partialResponse: String.PartialGenerated
var body: some View {
Text(partialResponse.content, format: .markdown)
.padding()
.background(.gray.opacity(0.1), in: RoundedRectangle(cornerRadius: 12))
.contentTransition(.opacity)
.animation(.bouncy, value: partialResponse)
}
}
Key animation principles:
- Use
.animation(.bouncy, value: partialResponse)
for smooth text updates - Add
.contentTransition(.opacity)
to avoid choppy text changes - Keep transition duration short (0.2-0.5 seconds max)
Fixing View Identity Problems
When streaming text replaces with final messages, SwiftUI can get confused about view identity:
// ❌ Causes UI jumps
ForEach(messages) { message in
MessageView(message: message)
}
if let partial = partialGenerated {
StreamingView(partial: partial)
}
// ✅ Fixed with consistent IDs
ForEach(messages) { message in
MessageView(message: message)
.id(message.id)
}
if let partial = partialGenerated {
StreamingView(partial: partial)
.id(partialId ?? UUID()) // Same ID used for final message
}
Managing Chat Sessions and Message History
Handle conversation flow properly with session management:
class ChatViewModel: ObservableObject {
@Published var messages: [ChatMessage] = []
@Published var userInput = ""
private var session: LanguageModelSession
private let instructions = "You are a dog specialist..."
func resetSession() {
// Cancel any ongoing streaming
streamingTask?.cancel()
// Clear UI state
messages.removeAll()
partialGenerated = nil
isResponding = false
// Create fresh session (important!)
session = LanguageModelSession(instructions: instructions)
}
private func saveResponse() {
guard let partial = partialGenerated else { return }
// Add AI response to chat history
messages.append(ChatMessage(
id: partialId ?? UUID(),
role: .assistant,
content: partial.content
))
// Clear streaming state
partialGenerated = nil
partialId = nil
}
}
Proper Progress Indication
Show loading states only when appropriate:
if isResponding && partialGenerated == nil {
// Show spinner only before streaming starts
ProgressView()
} else if let partial = partialGenerated {
// Show streaming content
StreamingResponseView(partial: partial)
}
Handling Concurrent Requests and Session Limits
Each LanguageModelSession
can only handle one request at a time:
func sendMessage(_ input: String) {
// Prevent multiple concurrent requests
guard !session.isResponding else { return }
// Or create multiple sessions for parallel processing
let newSession = LanguageModelSession(instructions: instructions)
}
Session management strategies:
- Single session: Simple conversations with memory
- Multiple sessions: Parallel processing different topics
- Session pools: Handle high-volume concurrent requests
Real-World Error Handling and User Feedback
The Foundation Models framework can fail in various ways. Handle them gracefully:
private func handleStreamingError(_ error: Error) {
if let genError = error as? FoundationModels.GenerationError {
switch genError.type {
case .guardrailViolation:
showMessage("I can't help with that type of request")
case .exceedsContextWindowSize:
showMessage("This conversation is getting too long. Let's start fresh!")
resetSession()
case .rateLimited:
showMessage("I'm busy with other tasks. Please try again in a moment")
default:
showMessage("Something went wrong. Please try again")
}
}
}
Performance Optimization Tips for On-Device AI
Memory Management
- Monitor memory usage – the 3B model uses ~3GB RAM
- Consider releasing sessions when not needed
- Test on real devices, not just simulators
Battery Impact
- Long AI generations drain battery quickly
- Consider limiting response length for mobile use
- Show battery usage warnings for intensive tasks
Background Handling
- AI requests get rate-limited when app goes to background
- Save conversation state before backgrounding
- Resume gracefully when returning to foreground
What’s Next: Tool Calling and Knowledge Enhancement
This basic chatbot is just the foundation. The real power comes with tool calling – letting your AI fetch current data, search databases, and enhance its knowledge beyond the 2023 training cutoff.
In the next tutorial, we’ll add:
- Custom tool functions to fetch dog breed data
- Intent detection to decide when to use tools
- Knowledge enhancement for accurate, up-to-date responses
- Structured data integration with your app’s backend
The Foundation Models framework gives you a solid base for on-device AI, but tool calling makes it production-ready for real-world applications.
Ready to enhance your chatbot with tool calling? Check out part 2 of this series where we add custom data sources and make our AI truly intelligent.