Measurements¶

The classes RuntimeMeasurement and MemoryMeasurement measure the runtime and memory respectively.

Contains a class to perform run time and memory measurements of forward/backward.

class experiments.util.measurements.Measurement(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, dev: device, targets: List[Dict[str, Tensor]] | None = None)¶

Base class for measurements. This is meant to be subclassed and extended.

Store the model, loss function, inputs, labels, and the device.

Parameters:

model_fn – A function that returns a model.
loss_fn – A function that returns a loss function.
x – The input tensor.
y – The output tensor.
dev – The device to measure run time on.
targets – Targets in case of detection model.

set_up(synchronize: bool = True, grad_linear_weights: bool = True, grad_linear_bias: bool = True, grad_conv_weights: bool = True, grad_conv_bias: bool = True, grad_norm_weights: bool = True, grad_norm_bias: bool = True, grad_input: bool = False, grad_embed_weights: bool = False, surgical_first: bool = False, surgical_last: bool = False) → Tuple[Module, Module, Tensor, Tensor, List[Dict[str, Tensor]] | None, List[Tensor], List[Tensor]]¶

Initialize model and loss function, load to device (including data).

Syncs CUDA threads if the device is a GPU to avoid leaking run time of this function into the measurement.

Parameters:

synchronize (bool, optional) – Whether to synchronize CUDA threads after loading the model, loss function, and data to the device. Default: True.
grad_linear_weights (bool, optional) – Whether to compute the gradient of the linear layer weights. Default: True.
grad_linear_bias (bool, optional) – Whether to compute the gradient of the linear layer bias. Default: True.
grad_conv_weights (bool, optional) – Whether to compute the gradient of the convolution layer weights. Default: True.
grad_conv_bias (bool, optional) – Whether to compute the gradient of the convolution layer bias. Default: True.
grad_norm_weights (bool, optional) – Whether to compute the gradient of the normalization layer weights. Default: True.
grad_norm_bias (bool, optional) – Whether to compute the gradient of the normalization layer bias. Default: True.
grad_input (bool, optional) – Whether to compute the gradient of the input. Default: False.
grad_embed_weights (bool, optional) – Whether to compute the gradient of the embedding layer weights. Default: True.
surgical_first (bool, optional) – Corresponds to computing gradient only for the first quarter of layers
surgical_last (bool, optional) – Corresponds to computing gradient only for the last quarter of layers

Returns:

The model, loss function, input tensor, and output tensor. All are loaded to the specified device.

class experiments.util.measurements.MemoryMeasurement(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, dev: device, targets: List[Dict[str, Tensor]] | None = None)¶

A class to measure memory usage after a forward pass.

Store the model, loss function, inputs, labels, and the device.

Parameters:

model_fn – A function that returns a model.
loss_fn – A function that returns a loss function.
x – The input tensor.
y – The output tensor.
dev – The device to measure run time on.
targets – Targets in case of detection model.

after_forward(**case_kwargs) → float¶

Return memory usage after a forward pass.

Note: We directly pass input embeddings to transformers so embed weights are never used and their grad will be None.

Parameters:: **case_kwargs – Strings denoting which grads to compute and which to not, check docs of Measurement.set_up()
Returns:: The memory usage in bytes.
Return type:: float

class experiments.util.measurements.RuntimeMeasurement(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, dev: device, targets: List[Dict[str, Tensor]] | None = None)¶

A class to perform run time measurements of forward+backward pass.

Store the model, loss function, inputs, labels, and the device.

Parameters:

model_fn – A function that returns a model.
loss_fn – A function that returns a loss function.
x – The input tensor.
y – The output tensor.
dev – The device to measure run time on.
targets – Targets in case of detection model.

forward_backward(**case_kwargs) → float¶

Perform a forward and backward pass and return the run time.

Syncs CUDA threads if the device is a GPU. Note: We directly pass input embeddings to transformers so embed weights are never used and their grad will be None.

Parameters:: **case_kwargs – Strings denoting which grads to compute and which to not, check docs of Measurement.set_up()
Returns:: The run time in seconds.
Return type:: float

experiments.util.measurements.maybe_synchronize(dev: device)¶

Synchronize CUDA kernels if device is GPU.

Parameters:: dev – PyTorch device.

experiments.util.measurements.separate_grad_arguments(model: Module, grad_linear_weights: bool, grad_linear_bias: bool, grad_conv_weights: bool, grad_conv_bias: bool, grad_norm_weights: bool, grad_norm_bias: bool, grad_embed_weights: bool) → Tuple[List[Parameter], List[Parameter]]¶

Separate the parameters of a model into leafs and non-leafs.

Parameters:

model – The model to separate the parameters of.
grad_linear_weights – Whether to compute the gradient of the linear layer weights.
grad_linear_bias – Whether to compute the gradient of the linear layer bias.
grad_conv_weights – Whether to compute the gradient of the convolution layer weights.
grad_conv_bias – Whether to compute the gradient of the convolution layer bias.
grad_norm_weights – Whether to compute the gradient of the normalization layer weights.
grad_norm_bias – Whether to compute the gradient of the normalization layer bias.
grad_embed_weights – Whether to compute the gradient of the embedding layer weights

Returns:

A tuple of lists of parameters. The first list contains the leafs, the second list contains the non-leafs.

Raises:

NotImplementedError – If an unknown layer with parameters is encountered.

experiments.util.measurements.separate_surgical(model: Module, surgical_first: bool, surgical_last: bool) → Tuple[List[Parameter], List[Parameter]]¶

Separate the parameters of a model into leafs and no-leafs for surgical fine-tuning

One and only one of surgical_first and surgical_last must be True :param model: The model to separate the parameters of. :type model: Module :param surgical_first: Whether to compute the gradients of the first quarter of layers with parameters. :type surgical_first: bool :param surgical_last: Whether to compute the gradients of the last quarter of layers with parameters. :type surgical_last: bool