Measurements

The classes RuntimeMeasurement and MemoryMeasurement measure the runtime and memory respectively.

Contains a class to perform run time and memory measurements of forward/backward.

class experiments.util.measurements.Measurement(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, dev: device, targets: List[Dict[str, Tensor]] | None = None)

Base class for measurements. This is meant to be subclassed and extended.

Store the model, loss function, inputs, labels, and the device.

Parameters:
  • model_fn – A function that returns a model.

  • loss_fn – A function that returns a loss function.

  • x – The input tensor.

  • y – The output tensor.

  • dev – The device to measure run time on.

  • targets – Targets in case of detection model.

set_up(synchronize: bool = True, grad_linear_weights: bool = True, grad_linear_bias: bool = True, grad_conv_weights: bool = True, grad_conv_bias: bool = True, grad_norm_weights: bool = True, grad_norm_bias: bool = True, grad_input: bool = False, grad_embed_weights: bool = False, surgical_first: bool = False, surgical_last: bool = False) Tuple[Module, Module, Tensor, Tensor, List[Dict[str, Tensor]] | None, List[Tensor], List[Tensor]]

Initialize model and loss function, load to device (including data).

Syncs CUDA threads if the device is a GPU to avoid leaking run time of this function into the measurement.

Parameters:
  • synchronize (bool, optional) – Whether to synchronize CUDA threads after loading the model, loss function, and data to the device. Default: True.

  • grad_linear_weights (bool, optional) – Whether to compute the gradient of the linear layer weights. Default: True.

  • grad_linear_bias (bool, optional) – Whether to compute the gradient of the linear layer bias. Default: True.

  • grad_conv_weights (bool, optional) – Whether to compute the gradient of the convolution layer weights. Default: True.

  • grad_conv_bias (bool, optional) – Whether to compute the gradient of the convolution layer bias. Default: True.

  • grad_norm_weights (bool, optional) – Whether to compute the gradient of the normalization layer weights. Default: True.

  • grad_norm_bias (bool, optional) – Whether to compute the gradient of the normalization layer bias. Default: True.

  • grad_input (bool, optional) – Whether to compute the gradient of the input. Default: False.

  • grad_embed_weights (bool, optional) – Whether to compute the gradient of the embedding layer weights. Default: True.

  • surgical_first (bool, optional) – Corresponds to computing gradient only for the first quarter of layers

  • surgical_last (bool, optional) – Corresponds to computing gradient only for the last quarter of layers

Returns:

The model, loss function, input tensor, and output tensor. All are loaded to the specified device.

class experiments.util.measurements.MemoryMeasurement(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, dev: device, targets: List[Dict[str, Tensor]] | None = None)

A class to measure memory usage after a forward pass.

Store the model, loss function, inputs, labels, and the device.

Parameters:
  • model_fn – A function that returns a model.

  • loss_fn – A function that returns a loss function.

  • x – The input tensor.

  • y – The output tensor.

  • dev – The device to measure run time on.

  • targets – Targets in case of detection model.

after_forward(**case_kwargs) float

Return memory usage after a forward pass.

Note: We directly pass input embeddings to transformers so embed weights are never used and their grad will be None.

Parameters:

**case_kwargs – Strings denoting which grads to compute and which to not, check docs of Measurement.set_up()

Returns:

The memory usage in bytes.

Return type:

float

class experiments.util.measurements.RuntimeMeasurement(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, dev: device, targets: List[Dict[str, Tensor]] | None = None)

A class to perform run time measurements of forward+backward pass.

Store the model, loss function, inputs, labels, and the device.

Parameters:
  • model_fn – A function that returns a model.

  • loss_fn – A function that returns a loss function.

  • x – The input tensor.

  • y – The output tensor.

  • dev – The device to measure run time on.

  • targets – Targets in case of detection model.

forward_backward(**case_kwargs) float

Perform a forward and backward pass and return the run time.

Syncs CUDA threads if the device is a GPU. Note: We directly pass input embeddings to transformers so embed weights are never used and their grad will be None.

Parameters:

**case_kwargs – Strings denoting which grads to compute and which to not, check docs of Measurement.set_up()

Returns:

The run time in seconds.

Return type:

float

experiments.util.measurements.maybe_synchronize(dev: device)

Synchronize CUDA kernels if device is GPU.

Parameters:

dev – PyTorch device.

experiments.util.measurements.separate_grad_arguments(model: Module, grad_linear_weights: bool, grad_linear_bias: bool, grad_conv_weights: bool, grad_conv_bias: bool, grad_norm_weights: bool, grad_norm_bias: bool, grad_embed_weights: bool) Tuple[List[Parameter], List[Parameter]]

Separate the parameters of a model into leafs and non-leafs.

Parameters:
  • model – The model to separate the parameters of.

  • grad_linear_weights – Whether to compute the gradient of the linear layer weights.

  • grad_linear_bias – Whether to compute the gradient of the linear layer bias.

  • grad_conv_weights – Whether to compute the gradient of the convolution layer weights.

  • grad_conv_bias – Whether to compute the gradient of the convolution layer bias.

  • grad_norm_weights – Whether to compute the gradient of the normalization layer weights.

  • grad_norm_bias – Whether to compute the gradient of the normalization layer bias.

  • grad_embed_weights – Whether to compute the gradient of the embedding layer weights

Returns:

A tuple of lists of parameters. The first list contains the leafs, the second list contains the non-leafs.

Raises:

NotImplementedError – If an unknown layer with parameters is encountered.

experiments.util.measurements.separate_surgical(model: Module, surgical_first: bool, surgical_last: bool) Tuple[List[Parameter], List[Parameter]]

Separate the parameters of a model into leafs and no-leafs for surgical fine-tuning

One and only one of surgical_first and surgical_last must be True :param model: The model to separate the parameters of. :type model: Module :param surgical_first: Whether to compute the gradients of the first quarter of layers with parameters. :type surgical_first: bool :param surgical_last: Whether to compute the gradients of the last quarter of layers with parameters. :type surgical_last: bool