Running a Local LLM on Mobile: Testing PocketPal on iPhone 12

With the increasing accessibility of large language models (LLMs), running them locally on mobile devices is an exciting prospect. I recently tested PocketPal, a mobile LLM interface, on my iPhone 12, using a distilled 4-bit quantized model. Here’s a breakdown of my experience, covering installation, performance, and overall usability.

Why Run an LLM on Mobile?

Running an LLM locally on a mobile device comes with several advantages:

Privacy: No data is sent to external servers.
Offline Access: Works without an internet connection.
Lower Cost: Avoids API costs associated with cloud-based models.

What is Quantization?

Quantization is a technique used to reduce the memory and computational requirements of machine learning models by representing their weights with lower precision numbers. Instead of using 32-bit floating-point numbers, models can be compressed into 8-bit or even 4-bit integers while maintaining reasonable accuracy.

For LLMs on mobile, 4-bit quantization significantly reduces the model size, making it feasible to run on devices with limited resources. However, this compression can lead to:

Slightly reduced accuracy due to loss of precision.
Faster inference times, as lower-bit computations require less processing power.
Lower memory usage, allowing larger models to fit within mobile device constraints.

Setting Up and Running PocketPal on iPhone 12

1. Download and Install PocketPal

PocketPal on App Store

Open the App Store and search for PocketPal AI by Asghar Ghorbani.
Download and install the app.
Open the app and allow necessary permissions.

2. Adding a Model

Model Selection in PocketPal

Navigate to the Models section in the PocketPal app.
Click the + button to add a new model.
You will see two options:
- Add from Hugging Face
- Add Local Model
Select Add from Hugging Face to browse available models.

3. Selecting and Downloading a Model

Model Downloading

Search for DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.
Select the model and start downloading it (size: 1.06GB, 1.78B parameters).
Once downloaded, the model will appear under the Ready to Use section.

Section

4. Running Benchmarks

Benchmark Results I ran benchmarks on my iPhone 12 using the DeepSeek-R1-Distill-Qwen-1.5B-Q4_0 model. Here are the key results:

Model Size: 1.06 GB with 1.78 billion parameters.
Benchmark Configuration:
- Prompt Processing: 512
- Token Generation: 128
- Pipeline Length: 1
- Repetitions: 3
Model Settings:
- Context Length: 1024 tokens
- Batch Size: 512
- CPU Threads: 4
- GPU Layers: 0 (fully CPU-based execution)
- Flash Attention: Disabled
Performance Metrics:
- Prompt Processing Speed: 26.93 tokens/sec (±2.37)
- Token Generation Speed: 18.05 tokens/sec (±0.75)
- Total Execution Time: 1 minute 18 seconds
- Peak Memory Usage: 35.0% (1GB / 4GB)

Live Demo: Running PocketPal on iPhone 12

Watch a live demonstration of PocketPal running a distilled 4-bit quantized model on an iPhone 12: Image: Results

Video demonstration is available here: Video

Analysis of Results

Decent Processing Speed: With a distilled 4-bit quantized model, the 18.05 t/s token generation rate is quite reasonable for mobile inference.
Low Memory Footprint: The 1GB RAM usage means this can run on even mid-range smartphones.
CPU-Based Execution: Since 0 GPU layers were used, this proves mobile CPUs are capable of running quantized LLMs efficiently.
Flash Attention Disabled: If supported, enabling it might further optimize speed and reduce lag.

Final Thoughts

Running an LLM locally on an iPhone 12 with PocketPal is feasible but comes with trade-offs. It’s a promising step toward self-hosted AI assistants, though optimization and hardware improvements will be crucial for broader adoption. If you’re privacy-conscious or need offline AI capabilities, it’s definitely worth exploring!

Future Improvements I’d Like to See:

Better memory efficiency to reduce battery drain.
Enhanced speed for real-time interaction.
More user-friendly model importing and switching.

Running a Local LLM on Mobile: Testing PocketPal on iPhone 12#

Why Run an LLM on Mobile?#

What is Quantization?#

Setting Up and Running PocketPal on iPhone 12#

1. Download and Install PocketPal#

2. Adding a Model#

3. Selecting and Downloading a Model#

4. Running Benchmarks#

Live Demo: Running PocketPal on iPhone 12#

Analysis of Results#

Final Thoughts#

Future Improvements I’d Like to See:#