Resource & Memory Management#
Control GPU/CPU memory consumption during compilation and inference: weight streaming, dynamic memory allocation, and low-CPU-memory compilation.
Control GPU/CPU memory consumption during compilation and inference: weight streaming, dynamic memory allocation, and low-CPU-memory compilation.