estimate.py¶

This script observes the peak memory after a forward pass/total time taken till backward pass for a model.

Estimate possible speed-up when randomizing the weight VJP of convolutions.

We take a CNN and answer the following questions:

Q1) What is the relative run time consumed by the weight VJP for convolutions?

Q2) Assume we achieve a speed-up x by randomizing the weight VJP, what would: be the speed-up for one optimization step (forward+backward)?

Q3) The same as Q1) and Q2) but in terms of memory consumption.

experiments.util.estimate.estimate_mem_savings(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, targets: List[Dict[str, Tensor]] | None, architecture: str, dev: device, case: List[str], results_dir: str, return_val: bool = False)¶

Print an estimate of the memory savings caused by weight VJP memory savings.

Parameters:

Returns:

The required estimate (only returned if return_val is True)

Return type:

result

experiments.util.estimate.estimate_speedup(model_fn: Callable[[], Module], loss_fn: Callable[[], Module], x: Tensor, y: Tensor, targets: List[Dict[str, Tensor]] | None, architecture: str, dev: device, case: List[str], results_dir: str, return_val: bool = False)¶

Save an estimate of total training speed-up caused by a weight VJP speed-up.

Parameters:

Returns:

The required estimate (only returned if return_val is True)

Return type:

result

experiments.util.estimate.parse_case(case: List[str] | None) → Dict[str, bool]¶

Small helper function to convert cases into kw-arguments for measurements

Parameters:: case (Optional[List[str]]) – List of all cases
Returns:: dictionary with keys as allowed_cases present in the input (which dont start with no_)
Return type:: Dict[str, bool]

experiments.util.estimate.skip_case_check(args: Namespace) → bool¶

Decide whether to skip the case:

when case has grad_norm_* but model does not have any normalization layers
when case has no_grad_embed_weights but no grad_input: there is a backward error (no input requires_grad)