Play 19
Edge AI Phi-4
High🔧 Skeleton
Deploy Phi-4 SLM on edge devices with ONNX quantization and offline inference.
Run a small language model on edge devices without cloud connectivity. Phi-4 quantized to INT4 via ONNX Runtime runs on devices with 4GB+ RAM. IoT Hub manages device fleet, syncs model updates, and collects telemetry. Supports offline inference with periodic cloud sync for model updates.
Architecture Pattern
Edge AI, SLM, ONNX quantization, offline inference, device sync
Azure Services
IoT HubContainer InstancesONNX RuntimeAzure Storage
DevKit (.github Agentic OS)
- agent.md — edge AI engineer persona
- instructions.md — device management guide
- plugins/ — ONNX optimizer, device syncer, inference tester
TuneKit (AI Config)
- config/edge.json — quantization level, model config, memory constraints
- config/sync.json — update schedule, rollback rules
Tuning Parameters
Quantization level (INT4/INT8)Model configSync scheduleDevice memory budget
Estimated Cost
Dev/Test
$20–50/mo
Production
$100–500/mo