WandaSparsifier¶

class torchao.sparsity.WandaSparsifier(sparsity_level: float = 0.5, semi_structured_block_size: Optional[int] = None)[source]¶

Wanda sparsifier

Wanda (Pruning by Weights and activations), proposed in https://arxiv.org/abs/2306.11695 is an activation aware pruning method. The sparsifier removes weights based on the product of the input activation norm and the weight magnitude.

This sparsifier is controlled by three variables: 1. sparsity_level defines the number of sparse blocks that are zeroed-out;

Parameters:

sparsity_level – The target level of sparsity;
model – The model to be sparsified;

prepare(model: Module, config: List[Dict]) → None[source]¶

Prepares a model, by adding the parametrizations.

Note:

The model is modified inplace. If you need to preserve the original
model, use copy.deepcopy.

squash_mask(params_to_keep: Optional[Tuple[str, ...]] = None, params_to_keep_per_layer: Optional[Dict[str, Tuple[str, ...]]] = None, *args, **kwargs)[source]¶

Squashes the sparse masks into the appropriate tensors.

If either the params_to_keep or params_to_keep_per_layer is set, the module will have a sparse_params dict attached to it.

Parameters:

params_to_keep – List of keys to save in the module or a dict representing the modules and keys that will have sparsity parameters saved
params_to_keep_per_layer – Dict to specify the params that should be saved for specific layers. The keys in the dict should be the module fqn, while the values should be a list of strings with the names of the variables to save in the sparse_params

Examples

>>> # xdoctest: +SKIP("locals are undefined")
>>> # Don't save any sparse params
>>> sparsifier.squash_mask()
>>> hasattr(model.submodule1, 'sparse_params')
False

>>> # Keep sparse params per layer
>>> sparsifier.squash_mask(
...     params_to_keep_per_layer={
...         'submodule1.linear1': ('foo', 'bar'),
...         'submodule2.linear42': ('baz',)
...     })
>>> print(model.submodule1.linear1.sparse_params)
{'foo': 42, 'bar': 24}
>>> print(model.submodule2.linear42.sparse_params)
{'baz': 0.1}

>>> # Keep sparse params for all layers
>>> sparsifier.squash_mask(params_to_keep=('foo', 'bar'))
>>> print(model.submodule1.linear1.sparse_params)
{'foo': 42, 'bar': 24}
>>> print(model.submodule2.linear42.sparse_params)
{'foo': 42, 'bar': 24}

>>> # Keep some sparse params for all layers, and specific ones for
>>> # some other layers
>>> sparsifier.squash_mask(
...     params_to_keep=('foo', 'bar'),
...     params_to_keep_per_layer={
...         'submodule2.linear42': ('baz',)
...     })
>>> print(model.submodule1.linear1.sparse_params)
{'foo': 42, 'bar': 24}
>>> print(model.submodule2.linear42.sparse_params)
{'foo': 42, 'bar': 24, 'baz': 0.1}

update_mask(module: Module, tensor_name: str, sparsity_level: float, **kwargs) → None[source]¶

Pruning function for WandaSparsifier

The activation statistics is retrieved first in the act_per_input variable. Then the Wanda pruning metric is computed. The weight matrix is then pruned by comparing this metric across the whole current layer.

WandaSparsifier¶

Docs

Tutorials

Resources