Measurements¶
The classes RuntimeMeasurement and MemoryMeasurement measure the runtime and memory respectively.
Contains a class to perform run time and memory measurements of forward/backward.
- class experiments.util.measurements.Measurement(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, dev: device, targets: List[Dict[str, Tensor]] | None = None)¶
Base class for measurements. This is meant to be subclassed and extended.
Store the model, loss function, inputs, labels, and the device.
- Parameters:
model_fn – A function that returns a model.
loss_fn – A function that returns a loss function.
x – The input tensor.
y – The output tensor.
dev – The device to measure run time on.
targets – Targets in case of detection model.
- set_up(synchronize: bool = True, grad_linear_weights: bool = True, grad_linear_bias: bool = True, grad_conv_weights: bool = True, grad_conv_bias: bool = True, grad_norm_weights: bool = True, grad_norm_bias: bool = True, grad_input: bool = False, grad_embed_weights: bool = False, surgical_first: bool = False, surgical_last: bool = False) Tuple[Module, Module, Tensor, Tensor, List[Dict[str, Tensor]] | None, List[Tensor], List[Tensor]]¶
Initialize model and loss function, load to device (including data).
Syncs CUDA threads if the device is a GPU to avoid leaking run time of this function into the measurement.
- Parameters:
synchronize (bool, optional) – Whether to synchronize CUDA threads after loading the model, loss function, and data to the device. Default: True.
grad_linear_weights (bool, optional) – Whether to compute the gradient of the linear layer weights. Default: True.
grad_linear_bias (bool, optional) – Whether to compute the gradient of the linear layer bias. Default: True.
grad_conv_weights (bool, optional) – Whether to compute the gradient of the convolution layer weights. Default: True.
grad_conv_bias (bool, optional) – Whether to compute the gradient of the convolution layer bias. Default: True.
grad_norm_weights (bool, optional) – Whether to compute the gradient of the normalization layer weights. Default: True.
grad_norm_bias (bool, optional) – Whether to compute the gradient of the normalization layer bias. Default: True.
grad_input (bool, optional) – Whether to compute the gradient of the input. Default: False.
grad_embed_weights (bool, optional) – Whether to compute the gradient of the embedding layer weights. Default: True.
surgical_first (bool, optional) – Corresponds to computing gradient only for the first quarter of layers
surgical_last (bool, optional) – Corresponds to computing gradient only for the last quarter of layers
- Returns:
The model, loss function, input tensor, and output tensor. All are loaded to the specified device.
- class experiments.util.measurements.MemoryMeasurement(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, dev: device, targets: List[Dict[str, Tensor]] | None = None)¶
A class to measure memory usage after a forward pass.
Store the model, loss function, inputs, labels, and the device.
- Parameters:
model_fn – A function that returns a model.
loss_fn – A function that returns a loss function.
x – The input tensor.
y – The output tensor.
dev – The device to measure run time on.
targets – Targets in case of detection model.
- after_forward(**case_kwargs) float¶
Return memory usage after a forward pass.
Note: We directly pass input embeddings to transformers so embed weights are never used and their grad will be None.
- Parameters:
**case_kwargs – Strings denoting which grads to compute and which to not, check docs of Measurement.set_up()
- Returns:
The memory usage in bytes.
- Return type:
- class experiments.util.measurements.RuntimeMeasurement(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, dev: device, targets: List[Dict[str, Tensor]] | None = None)¶
A class to perform run time measurements of forward+backward pass.
Store the model, loss function, inputs, labels, and the device.
- Parameters:
model_fn – A function that returns a model.
loss_fn – A function that returns a loss function.
x – The input tensor.
y – The output tensor.
dev – The device to measure run time on.
targets – Targets in case of detection model.
- forward_backward(**case_kwargs) float¶
Perform a forward and backward pass and return the run time.
Syncs CUDA threads if the device is a GPU. Note: We directly pass input embeddings to transformers so embed weights are never used and their grad will be None.
- Parameters:
**case_kwargs – Strings denoting which grads to compute and which to not, check docs of Measurement.set_up()
- Returns:
The run time in seconds.
- Return type:
- experiments.util.measurements.maybe_synchronize(dev: device)¶
Synchronize CUDA kernels if device is GPU.
- Parameters:
dev – PyTorch device.
- experiments.util.measurements.separate_grad_arguments(model: Module, grad_linear_weights: bool, grad_linear_bias: bool, grad_conv_weights: bool, grad_conv_bias: bool, grad_norm_weights: bool, grad_norm_bias: bool, grad_embed_weights: bool) Tuple[List[Parameter], List[Parameter]]¶
Separate the parameters of a model into leafs and non-leafs.
- Parameters:
model – The model to separate the parameters of.
grad_linear_weights – Whether to compute the gradient of the linear layer weights.
grad_linear_bias – Whether to compute the gradient of the linear layer bias.
grad_conv_weights – Whether to compute the gradient of the convolution layer weights.
grad_conv_bias – Whether to compute the gradient of the convolution layer bias.
grad_norm_weights – Whether to compute the gradient of the normalization layer weights.
grad_norm_bias – Whether to compute the gradient of the normalization layer bias.
grad_embed_weights – Whether to compute the gradient of the embedding layer weights
- Returns:
A tuple of lists of parameters. The first list contains the leafs, the second list contains the non-leafs.
- Raises:
NotImplementedError – If an unknown layer with parameters is encountered.
- experiments.util.measurements.separate_surgical(model: Module, surgical_first: bool, surgical_last: bool) Tuple[List[Parameter], List[Parameter]]¶
Separate the parameters of a model into leafs and no-leafs for surgical fine-tuning
One and only one of surgical_first and surgical_last must be True :param model: The model to separate the parameters of. :type model: Module :param surgical_first: Whether to compute the gradients of the first quarter of layers with parameters. :type surgical_first: bool :param surgical_last: Whether to compute the gradients of the last quarter of layers with parameters. :type surgical_last: bool