Edge AI Glossary

A comprehensive glossary of terms used in Edge AI, embedded machine learning, and related fields.

A

Accelerator

Specialized hardware designed to speed up AI computations. Examples include NPUs, TPUs, and GPUs optimized for inference workloads.

Activation Function

A mathematical function applied to neural network outputs (e.g., ReLU, Sigmoid). Edge-optimized models often use simplified activation functions for efficiency.

B

Batch Normalization

A technique to stabilize neural network training. Often fused with other operations during model optimization for edge deployment.

Benchmark

A standardized test to measure and compare the performance of AI hardware or software. MLPerf Tiny is a common edge AI benchmark.

D

Depthwise Separable Convolution

A computationally efficient convolution technique used in mobile architectures like MobileNet. Reduces parameters and operations significantly.

E

Edge Device

Any computing device at the “edge” of a network—smartphones, sensors, wearables, industrial controllers, vehicles, etc.

Embedded ML

Machine learning designed for embedded systems with strict resource constraints. Synonymous with Edge AI in many contexts.

F

Federated Learning

A privacy-preserving approach where models train across multiple edge devices without centralizing data.

FLOPS

Floating Point Operations Per Second. A measure of computational performance. Edge devices typically operate in MFLOPS to GFLOPS range.

I

Inference

Running a trained model to make predictions. Edge AI focuses on efficient inference rather than training.

INT8

Integer 8-bit quantization format that represents values as 8-bit integers instead of 32-bit floats, reducing model size by 4x.

K

Knowledge Distillation

Training a smaller “student” model to replicate the behavior of a larger “teacher” model. Essential for creating edge-deployable models.

L

Latency

Time from input to output. Edge AI typically targets sub-millisecond to tens-of-milliseconds latency.

M

MACs

Multiply-Accumulate Operations. The fundamental operation in neural networks. Edge device capability is often measured in MACs per second.

Microcontroller (MCU)

A compact integrated circuit with processor, memory, and I/O. Common edge AI targets include ARM Cortex-M series.

N

NPU (Neural Processing Unit)

Dedicated processor designed specifically for neural network inference, optimized for matrix operations.

O

ONNX

Open Neural Network Exchange. An open format for representing ML models, enabling interoperability between frameworks.

Operator Fusion

Combining multiple neural network operations into a single optimized kernel to reduce memory transfers.

P

Pruning

Removing unnecessary weights or neurons from a model to reduce size and computation while maintaining accuracy.

Q

Quantization

Reducing the numerical precision of model weights and activations (e.g., float32 to int8) to decrease model size and speed up inference.

Quantization-Aware Training (QAT)

Training that simulates quantization effects, resulting in models that maintain accuracy when quantized.

R

Real-Time Inference

Processing inputs and producing outputs within strict time deadlines, critical for safety and interactive applications.

T

TensorFlow Lite

Google’s framework for deploying ML models on mobile and embedded devices.