A comprehensive glossary of terms used in Edge AI, embedded machine learning, and related fields.
A
Accelerator
Specialized hardware designed to speed up AI computations. Examples include NPUs, TPUs, and GPUs optimized for inference workloads.
Activation Function
A mathematical function applied to neural network outputs (e.g., ReLU, Sigmoid). Edge-optimized models often use simplified activation functions for efficiency.
B
Batch Normalization
A technique to stabilize neural network training. Often fused with other operations during model optimization for edge deployment.
Benchmark
A standardized test to measure and compare the performance of AI hardware or software. MLPerf Tiny is a common edge AI benchmark.
D
Depthwise Separable Convolution
A computationally efficient convolution technique used in mobile architectures like MobileNet. Reduces parameters and operations significantly.
E
Edge Device
Any computing device at the “edge” of a network—smartphones, sensors, wearables, industrial controllers, vehicles, etc.
Embedded ML
Machine learning designed for embedded systems with strict resource constraints. Synonymous with Edge AI in many contexts.
F
Federated Learning
A privacy-preserving approach where models train across multiple edge devices without centralizing data.
FLOPS
Floating Point Operations Per Second. A measure of computational performance. Edge devices typically operate in MFLOPS to GFLOPS range.
I
Inference
Running a trained model to make predictions. Edge AI focuses on efficient inference rather than training.
INT8
Integer 8-bit quantization format that represents values as 8-bit integers instead of 32-bit floats, reducing model size by 4x.
K
Knowledge Distillation
Training a smaller “student” model to replicate the behavior of a larger “teacher” model. Essential for creating edge-deployable models.
L
Latency
Time from input to output. Edge AI typically targets sub-millisecond to tens-of-milliseconds latency.
M
MACs
Multiply-Accumulate Operations. The fundamental operation in neural networks. Edge device capability is often measured in MACs per second.
Microcontroller (MCU)
A compact integrated circuit with processor, memory, and I/O. Common edge AI targets include ARM Cortex-M series.
N
NPU (Neural Processing Unit)
Dedicated processor designed specifically for neural network inference, optimized for matrix operations.
O
ONNX
Open Neural Network Exchange. An open format for representing ML models, enabling interoperability between frameworks.
Operator Fusion
Combining multiple neural network operations into a single optimized kernel to reduce memory transfers.
P
Pruning
Removing unnecessary weights or neurons from a model to reduce size and computation while maintaining accuracy.
Q
Quantization
Reducing the numerical precision of model weights and activations (e.g., float32 to int8) to decrease model size and speed up inference.
Quantization-Aware Training (QAT)
Training that simulates quantization effects, resulting in models that maintain accuracy when quantized.
R
Real-Time Inference
Processing inputs and producing outputs within strict time deadlines, critical for safety and interactive applications.
T
TensorFlow Lite
Google’s framework for deploying ML models on mobile and embedded devices.